Psychometrika 





CONTENTS 


RELIABILITY FOR THE LAW OF COMPARATIVE JUDG- 


MENT 
HAROLD GULLIKSEN AND JOHN W. TUKEY 


AN INTER-BATTERY METHOD OF FACTOR ANALYSIS .. 
LepyarD R Tucker 


COMPARATAL DISPERSION, A MEASURE OF ACCURACY OF 
JUDGMENT 
HarRoLp GULLIKSEN 


APPLICATION OF THE QUARTIMAX METHOD OF ROTATION 
TO THURSTONE’S PRIMARY MENTAL ABILITIES 


STUDY 
CHARLES WRIGLEY, Davip R. SAUNDERS, AND JACK O. NEUHAUS 


A DISTINCTION BETWEEN EXACT AND APPROXIMATE 
NONPARAMETRIC METHODS 
WiuraM L. SAWREY 


BOOK REVIEWS 
Lee J. CRONBACH AND GOLDINE C. GLESER. Psychological Tests 
and Personnel Decisions 
Review by Bert F. GreEEn, Jr. 


Joe K. Apams. Basic Statistical Concepts 
Review by Rosert E. Morin 


Henry E. Garrett. Elementary Statistics 
Review by Rosert E. Morin 


W. ALLEN WALLIS AND Harry V. Roserts. Siatistics: A New 
Approach 
Review by CHartes L. Woop 


JEROME 8S. BRUNER, JACQUELINE J. GooDNow, AND GrorGE A. 
Austin. A Study of Thinking 
Review by RoBerT GLASER 


95 


137 








VOLUME TWENTY-THREE JUNE 1958 NUMBER 2 








PSYCHOMETRIKA—VOL, 23, NO. 2 
JUNE, 1958 


RELIABILITY FOR THE LAW OF COMPARATIVE JUDGMENT* 


HAROLD GULLIKSEN 


PRINCETON UNIVERSITY 
AND 
EDUCATIONAL TESTING SERVICE 
AND 
JoHn W. TuKEY 


PRINCETON UNIVERSITY 


A variance-components analysis is presented for paired comparisons 
in terms of three components: s, the scale value of the stimuli; d, a deviation 
from the linear model specified by the law of comparative judgment; and 6, 
a binomial error component. Estimates are given for each of the three var- 
iances, o,?, 74”, and o,. Several coefficients, analogous to reliability coefficients, 
based on these three variances are indicated. The techniques are illustrated 
in a replicated comparison of handwriting specimens, 


In studies using the method of paired comparisons and the law of com- 
parative judgment, it is desirable to determine the reliability of the scales 
which are obtained. For a given set of data one might like to know the extent 
to which the law of comparative judgment is successful in accounting for the 
total variance in the data. 

Mosteller [13] has outlined a chi square test of the agreement between 
the fitted proportions, p*, and the observed proportions, p; such a test labels 
the discrepancy between observation and theory as either “significant’’ or 
“nonsignificant”? but does not indicate whether the variance accounted for 
by the theory is large or small in relation to the total variance in the data. 

This property of significance tests is well known and has been clearly 
stated by Cochran [3] in his discussion of the chi square test: 


The power of the test to detect an underlying disagreement between theory 
and data is controlled largely by the size of the — With a small sample 
an alternative hypothesis which departs violently from the null hypothesis 
may still have a small probability of yielding a significant value of x?. In a 
very large sample, small and unimportant departures from the null hypothesis 
are almost certain to be detected. 


If the sample is small then the x’ test will show that the data are ‘‘not 


*This research was jointly go in part by Princeton University, the Office of 
Naval Research under contract Nonr-1858(15), and the National Science Foundation 
under grant NSF G-642, and in part by Educational Testing Service. Reproduction in 
whole or in part is permitted for any purpose of the United States Government. 

hanks are due to Ledyard Tucker and Frederic Lord for valuable suggestions on 
the development presented here. 


95 








96 PSYCHOMETRIKA 


significantly different from’’ quite a wide range of very different theories, 
while if the sample is large, the x’ test will show that the data are significantly 
different from those expected on a given theory even though the difference 
may be so very slight as to be negligible or unimportant on other criteria. 
Fisher [6] gives a good illustration of this point in his analysis of Weldon’s 
data on dice throws. If we test the theory that a throw of 5 or 6 has a prob- 
ability of 1/3, then chi square for Weldon’s data is very large, with p of 
.0001. However, a very slight change in the theory—from a probability of 
.3333 to a probability of .3377—gives a quite reasonable chi square with a 
p value of .3 or .4. 

In order to proceed appropriately in any scientific investigation it is 
likely to be necessary to answer two different questions. 


7. Is it reasonable to say that random variation accounts for the difference 
between theory and data? 

at. How large is this difference relative to the variation that 7s accounted 
for by the theory? 


In studying the applicability of the law of comparative judgment, 
variance-component and analysis of variance techniques can provide ap- 
propriate answers to these questions by methods outlined below and there 
applied to two sets of data on handwriting specimens and to Mosteller’s 
[13] baseball data. 


The Data of the Example 


The handwriting specimens were chosen from the Ayres [1] handwriting 
scale. This scale consists of a series of handwriting specimens of nine different 
scale levels, numbered from 10 (the lowest) to 90 (the highest). Each of 
these scale values is represented by three specimens, a “vertical” style (a), 
a normal slant (6), and an extreme slant (c). Thus the scale consists of 27 
different handwriting specimens. In conventional use, a handwriting specimen 
to be scaled is judged to be like one of the scale specimens or to fall between 
two of them. Thus, specimens can be scaled 10 to 90. The extremely bad or 
good ones might be either below 10 or above 90, respectively. Nine of these 
handwriting specimens were chosen for the present experiment: 50a, 50b, 50c, 
70a, 70b, 70c, 80a, 80b, and 80c. These specimens are shown in Figure 1. 
The 36 possible pairs for these nine specimens were arranged in a booklet, 
with instructions for the judge to pick the better member of each pair. 

It is interesting to note that one can easily develop a discussion in a class 
in measurement to indicate that there are numerous criteria on which it is 
possible to judge these handwriting specimens; the class will rather readily 
reach the conclusion that any set of judgments would be meaningless, highly 
unreliable, and unduplicatable unless one defined in great verbal detail 
exactly what characteristic was to be judged, instead of simply using the term 











oIn3t 
















































































o9]9g 
ais , saan’, 
gouwevdlle prbuwye -— 
‘ rb 
oct ap bp wie vaaoy| JOP UNM gong da np 
up VY vei fo yoyorngnn() 
creay pogrom ny ppboey 1p ban nif ney oval pupron 
qOZL 40s 
ma fo ry rrnchen Pith: ae ike seb? shy: 
hyo aad § Sie hey a dy>rp oy Tred y pram wro 























98 PSYCHOMETRIKA 


“better handwriting.” In the late 1930’s this schedule was given without 
preliminary discussion of the problem to 100 students at the University of 
Chicago, and in the late ’40’s it was given, again without preliminary dis- 
cussion, to 100 students at Princeton University. The data, p, the observed 
proportions, are shown in Table 1. The agreement between these two sets of 
judgments for 100 people taken in different institutions about ten years 
apart is rather striking. 

For any pair of stimuli, 7 and j, the probability of a choice, p, is ap- 
proximately given by the integral of the normal curve which corresponds to 
the difference of scale values interpreted as a normal deviate, fitted according 
to Thurstone [14, 15] or Mosteller [13]. 

The two sets of scale values obtained from utilizing the law of com- 
parative judgment as stated by Thurstone [14, 15] are shown in Table 2. 
In both of these scales, stimulus 50a, the poorest one, has been chosen as 
having a scale value of zero. The fitted proportions, p*, computed from 
these scale values are given in Table 3. The scale values for the total group, 
given in Table 2, are found by summing the frequencies for the two groups 
and then proceeding to scale as for the single groups. 

When Mosteller’s [13] chi square test for goodness of fit is applied to 
these data one finds (see Table 5, x5) a chi square of about 74 for the Chicago 
data, 76 for the Princeton data, and 127 for the two groups combined. The 
corresponding p values are each less than .0001, the chi square value at the 
.01 level being only 48. Thus, the conclusion reached would be that the data 
are not fully accounted for by the law of comparative judgment. However, 
it is interesting and meaningful to know whether the fraction of the systematic 
variation which is not accounted for should be regarded as approximately 1 
or 2 per cent or as much as 75 per cent. For example, if an aptitude test has 
a validity coefficient of .5 for predicting some criterion, it is considered a very 
useful test, even though it is also true that 75 per cent of the variance in the 
criterion is not accounted for by the test. Under such circumstances it would 
doubtless be true that the criterion contains a significant nonrandom com- 
ponent that is different from anything represented by the test. Analysis of 
variance and variance-component analysis procedures will give information 
on the percentage of the variance which is accounted for and on the per- 
centage which remains to be accounted for after the law of comparative 
judgment has been utilized, and will thus give coefficients which are analogous 
to reliabilities. For various illustrations of analysis of components of variance 
see, for example, Mood [12], Bennett and Franklin ((2], Chapter 7), Davies’ 
[4] discussion of “expectation of mean square’”’ beginning in Chapter 4, 
Duncan [5], especially Chapters 23 and 24, or Tippett’s [16] discussion of 
substantive variances in Chapters 6 and 7. 


Framework of the Analysis 


Since we are dealing with proportions, the sampling variance is a func- 





TABLE 





CT2°2 66'T 


lat 966°0 


>C 


cSe 





LaT°O 000°0 


TO 


169°0 OT2°O0 000°0 








40x 


G05 





suemtoeds BUTYTIaMpue}] 








699° 
To9s* 


o09¢° 
gcc’ 
66r° 
Gt" 


GHG" 











O49" = THG* 
299° 9nS° 
lgl° 90L° 
64L° —tth9" 
ee. & 
-- Tec ° 
009° = 
6To° -- 

69s" ==) ae. 

199° oF: SOS 

ecg" ogL* -- 


Te6" #98" = 999° 
<6" 099° 669° 


466° 626° 9g6L° 





106° $96" O61" 








90° 


9TT® 
OnT* 
gcc" 
TOC * 





150° 


OTe* 


© oa Om 


oa 


208 


201 


90L 


BOL 


206 


q0S 


BOG 


suowToedS BuTyTaapusy JOJ SanTeA aTRog 


“BLED 


uozysouTIg = 





feqyep oFeoTUD = O 


* 





50° 
29 


69° 


cg" 
Go 


a3 











Lo° 
1 
qT" 


go° 
60° 


90° 
go" 


gt” 


ot" 
he" 


09° 
09° 
99° 


19° 


60° 
¢0° 


qT" 
90° 


oe 
¢0° 


¢o° 
To’ 


20° 
20° 
20° 
To" 


om oa oa 


on 


Om 


208 


90g 


20L 


gol 


Bol 


0S 


BOG 








e808 °0L gol POL 





suotoads uty timpuey{ 





SUSUIT De 
Butytampurl] 





suamtoads Sutztsapuey 


suowt dods 
a0umN0g Bul yT4Apuey_ 





Z eTavL ut santeA eTeos 
mors pagndwod (4d) suotqyiodoig TeotqeIOSUL 


¢ TidvL 





T TidvL 


(d) suotz1zodosg Tequewtsedxg 











100 PSYCHOMETRIKA 


tion of the true proportion as well as of the sample size, No} = x(1 — =). 
If the analysis is conducted in terms of an angular transform of each pro- 
portion, then the binomial sampling variance is a function primarily of N, 
and not of the true proportion. The angular transform of the data is defined 
on different scales by different authors. The simplest scale for our purposes 
is that used by Hald [9] in his table, where 6 = 2 arcsin Vp; the arc is 
expressed in radians. 

The variance of @ is 1/N approximately, for proportions not too near 
1 or 0. If Np and N(1 — p) both exceed 4 or 5, the approximation is quite 
good. Even more extreme cases may be analyzed by the use of the averaged 
angular transformation, Freeman and Tukey [8], which will be satisfactory 
for Np, N(1 — p) > 1. In the other common version, tabled by Fisher and 
‘ates [7], 07 = arcsin Vp, the arc is expressed in degrees. 

The variance of 6, is approximately 821/N for proportions not too close 
to 1 or 0. Thus if p = .50, 6 = 2/2 = 1.5708, while 6, = 45.00. In general, 


45.00 
Or = 1.5708 6 = 0V 821. 
If tables of 6, are used, then, in order to fit into the pattern of Table 4, the 
resulting sums of squares should be divided by 821. 

The convenience of an analysis in terms of @ values lies in the fact that 
for pure binomial variation the variance of any @ is substantially equal to 
the reciprocal of the number of observations on which the p is based. This 
property of the angular transformation allows the definition of modified 
chi squares, such as the one used by Mosteller, which do not require de- 
nominators. When necessary, we shall distinguish these modified chi squares 
as angular chi squares. 

For each ordered pair of stimuli, 7 and j, we have an observed angle @ 
corresponding to the observed p’s of Table 1, and a fitted angle 6* derived 
from the fitted scale and corresponding to the fitted p*’s of Table 3. Because 
of the symmetry of the situation the mean of the complete set of p’s, or that 
of the p*’s, is .50. Correspondingly, the mean of any complete set of 6’s and 
the mean of any complete set of 6*’s equals 1.5708. 

Using angles, the analysis of variance is given in terms of the following 
definitions, where the arc is measured in radians: 





6 = 2 arcsin Vp (observed values); 
6* = 2 arcsin VV p* (fitted values) ; 
6 = 1.5708 = 2arcsin V5. 


If all the stimuli are identical and are judged to be identical, then the 
proportion of judgments 7 greater than j would be .5 in every case. 
We treat the observed angles 6 as if they were a sum of three types of 











HAROLD GULLIKSEN AND JOHN W. TUKEY 101 


contributions. This treatment is approximate in two ways. First, as Mosteller 
({13], p. 213) was careful to point out in connection with his chi square, the 
fitting used is a least square fit on the normal scale but not on the angular 
scale. Consequently, residuals on the angular scale will not be as small as 
those resulting from a fitting procedure tailored to the angular scale. As a 
consequence, our estimated “reliability” coefficients will be somewhat smaller, 
just as Mosteller’s chi squares are somewhat larger, than those obtainable 
from more closely tailored fits. Second, the imperfect linearity of the relation 
of angles to normal deviates implies that the true scale difference for any 
pair compared is, when measured in angles, only approximately a difference. 
For the purpose of defining variance components and reliabilities this latter 
effect should not be quantitatively important. We shall use these approxi- 
mations freely, usually without further ado. Let us return to the three types 
of contributions associated with a single comparison (as of two specimens of 
handwriting) and contributing to the observed angle. 

One contribution is approximately the difference between the true scale 
values for the two stimuli, say s; — s;. These s values may be thought of as 
drawn from a population with variance o; . Hence the values in the cells, 
8; — 8; , are regarded as drawn from a population with variance 2o; . 

Another is a deviations component, designated d, due to the deviations 
of the data from the linear scaling model used. These d values are treated 
as if they were drawn from a population with variance oj and average zero. 
(They are, of course, fixed by the selection of stimuli and constitute a set of 
numbers defined for 7 ¥ j, with 

d;;=—d;, Didi; =k(k—1)o3, and > d,;=0 foreach 7.) 
ti ii 

Due to the fact that we are dealing with values determined from pro- 
portions, we have a binomial error component, say b. These values are drawn 
from a population with variance o; . Since we are working with angular 
transforms it is not exactly true that E(b,;) is zero, but zero is a satisfactory 
approximation. It should be noted that it has been assumed that all subjects 
are drawn from the same population so that in this approach no allowance 
is made for stable individual differences in preferences among the subjects. 
(It is surely relevant to consider carefully the case in which no stable in- 
dividual differences exist before proceeding to the more complex analysis.) 

Thus we have the approximate composition of the observed values and 
the associated variance of the population from which each of these three 
quantities may be thought of as drawn, as follows: 


63; = (8; — 8) +d; + b;; + 8. 


The population variances of these three components are respectively 207 , 
a; , and o; . When the data are analyzed, the deviation of the observed 0 








102 PSYCHOMETRIKA 


from their mean, designated 6, is easily separated into two parts, one a 
linear component in agreement with the law of comparative judgment, the 
other a residual component, as follows: 
(6:5 — 8) = (0% — 8) + (0:; — 0%). 
li 


total near residual 


Correspondingly we have the three sums of squares. 


Total Sr =3 z, (0:; — 6)’, 
Linear Si = 4 Zz (0% 7 6)’, 
Residual Sp = 3 Z. (0;; — 0%)’. 


t¥7 


It may be noted that s, d, and 6 all affect the linear component (and also the 
total), while the residual is not affected by s, but only by d and b. This separa- 
tion can now be used as the basis for an analysis of variance. 

It should be noted that we are implicitly using the approximation 


1 - 
6% = ao» 93, — } Bin) + 8. 
hi hei 
The actual 6*, , as in Mosteller’s paper, is obtained as the angular trans- 
form of p*, found from an empirical law of comparative judgment scale 

fitted via the normal transform. 

Because of the nature of the fitting process, and because of the slightly 
nonlinear relation between angles and normal deviates, the deviations of the 
observations from their means have been separated into two parts which are 
not formally “orthogonal.” There is no necessity for 

me (65; soi 035)(0% i 8) 

ti 
to vanish. Consequently the two expressions for the sum of squares associated 
with the fit according to the law of comparative judgment, 

+d (0% — 8 = &, 

inj 
and 
4 (0; -— 0 — 4 DOs — 08)? = Sr — So, 
ini i] 

need not be precisely the same. So long as these give substantially the same 
answer, we may use either S,; or Sy — Sp in assessing a “Teliability’’ without 
serious error. Should they differ widely, reconsideration of the fitting would 
be in order. 

The linear, residual, and total mean squares, together with the number 
of judges, NV, and the number of stimuli, k, may be used to give estimates of 
the variances as follows: 








HAROLD GULLIKSEN AND JOHN W. TUKEY 103 





a 2Sr 2 2 2). 
Total mean square T= a est (20, + o3 + 03); 
Residual mean square D= 2p = est (oj + 9%); 
(k — 1)(k — 2) ‘ 
Binomial mean square + = est 0; ; 
Linear mean square bi ie est (ko? + 0% + 03). 





k-1 


It should be noted, as pointed out above, that we also have another possible 
value for the linear mean square given by 


SOP) Dm Se Se x cat (he? + od + 02). 
2 k-1 
We may also define an associated set of chi squares as follows: 
xt = NSr, xi = NS, , 


X7T-p or N(Sr = Sp), XD an NSp . 


The basic formulas for the associated analyses are summarized in Table 4. 

Starting with the observed values, p, and fitted values, p*, the values 
of @ and 6* are found. These are used to compute S; , Sp , and S; , the sums 
of squares. From these we get the mean square values designated 7, D, and 
L. These are used to give the estimates of variance components and “re- 
liabilities.” 

The application of the procedure indicated in Table 4 to the data of 
Table 1 gives the results indicated in Table 5. In Table 5 the values obtained 
for the Chicago group are indicated by C, the values for the Princeton group 
by P, and the values obtained by pooling the numbers of judgments for the 
two groups are indicated by 7’. The data on baseball teams presented by 
Mosteller [13] are indicated by B. 

The results show consistency in the variance components. Three estimates 
of the linear component are available in the handwriting experiment, 0.3521 
(Chicago), 0.2868 (Princeton), and 0.3115 (combined). Three estivaates are 
similarly available of a ‘deviations from scalability” component, 0.0166 
(Chicago), 0.0171 (Princeton), and 0.0176 (combined). In comparison with the 
linear component the deviations components are small and agree unusually 
well among themselves. This fact suggests that we have systematic and 
consistent, though small, deviations from the law of comparative judgment. 


Variance Ratios 


In dealing with psychological tests many different sets of variance ratios 
have been used, giving various types of validity and reliability ‘coefficients 








°TENSS S10 SSNTSA SPEteAG 3643 S194 SUBS 





















































"ated yowe sz0s sespnf jo azsqumu = * 
*poreduos are yotym jo sated TTe ‘suaqt Jo zequmu = y kz + _xg 2hSyXS 
7” pita - ied = Ts ey “or = Ty 
SUBTPEI TL6'T = ‘NUTS ore Zz = @ 3(A-x)z2 xz 
Pe ‘k pue x 
(San TRA 7999T2) xp EE SE ae ‘sonTe@a aTeos Jo szas OMy 10g 
(senTea peartosqo) dp uTS d18 2 = CT, 
. a . TInutjs Jo sonTea % + Po + Soo 
d-L d-T Fo aTeos IeauTT a | =< gc . Ag 
as z, Po + “22 
N/T -¢ Py eTeoS ABSUTT T e e 
z woly SUOTJETADG % ee S55 2 
oe Cee ee el > 
N/T 30 Sut{Tdues TeTwoutg 7 302 
a qusuodwos soueTIeA = =9yusuUO0dmoOdD aouRTIeA UOTPETICA ae ee SS 
fe JO 9}eutysT Ioj Toquxs gO 3a0mMo0g 5 LY ae € aoe = 9g 
H - v s 
: (a - T)e S02 
S stsXTeuy sqzusuoduog aouetre, aTeog JO (4) ,APTTIQETTOY,, 
oO 
ms - MOes atTeos aeoutt fq 
a con = & I + Po a, = (fhe % T9) z 2 ET Ioj paqunosoe 
2 2 e 2g (2-4%)(T jou ‘Tenptsoy 
ae i =) c... 
(Ss S)N “ : ag S s (aTeos re2uTT) 
% ‘ Py - Sox S 4 T-H quouspnf aaty 
T : ~ ” i ae = Ce or. -ereduiod Jo me 
sn = 2x 172i Ig = (9 - Ie) 22 JO neq 
a Cc 
CAT 2 
NP s i “ie f PT. 
su = oy So + Po + “oz gets Ts = (9 - "e) 33 (T= xx ksi 
“4 4 4 S2 2 
(5P) 
azenbs Tyd azenbs ueou jo arenbs sozenbs wopes. J UOTJETICA 
i aremsuy eNTeA sBBISAy ueoyW jo ums JO seorZeq jo aoan0s 
S 


aouBetIeA Jo stskTeuy 





soinpscsoig JO autTyNnO 


 TIavL 














means here that sverage values are equal. 





K = number of items, all pairs of which are compared. 


N = number of judges for each pair. 


> 
y 
b 


Nix “Zty~ 





HAROLD GULLIKSEN AND JOHN W. TUKEY 105 


each having somewhat different properties and serving somewhat different 
purposes. In general each coefficient is the ratio of a measure of ‘‘true variance” 
to a measure of “observed variance” which includes both “true variance 
and error variance.’’ One reasonable interpretation for paired comparisons 
is to regard the linear component, 207 , as “true variance” and the other 
two components, o7 + o; , as error variance, so that we may define a coefficient 
of linear consistency by 


x eeere kT hee 


The factor 2 arises from the fact that o? was normalized in terms of 
individual stimuli, while oj and o; are normalized in terms of differences. 
That is, o? is the variance of the k different s values, while the variance of 
the k(k — 1) values s; — 8; is 207 , and the observed variance for the cell 
entries is 207 + of + a}. 

If the linear sum of squares in taken as S7 — Sp, instead of S; , then we 
have another estimate for the coefficient of linear consistency, 


a ae a ee 


These coefficients, r, and r,,, indicate the extent to which the linear 
model, as represented by the fitted values 6*, fits the observed cell entries, 
given by 6. For example, if the agreement is perfect, then Sp and D will 
equal zero, and S, will equal S; , which means that 2L/k = T so that r, = 
r,, = 1.00. If, on the other hand, the mean squares 7’, L, and D are all equal, 
then r, = r,, = 0.00. These coefficients r, and r,, are regarded as similar to 
Tso , the square of the correlation between observed and true values assuming 
the linear model. Alternatively, r, and r,, may be regarded as representing 
the correlation between two sets of observed values provided their correlation 
is entirely accounted for by the true values, assuming a linear model. The 
coefficients 7, or 7,, may be regarded as appropriate to the recomparison of 
a randomly selected pair of the nine handwriting specimens against a back- 
ground of seven other specimens covering the same range of merit and hence 
drawn from a population having the same o? as the specimens used in this 
experiment. For example, if another set of three specimens each of values 
50, 70, and 80 were scaled, a similar o? would be expected; if o3 and o; also 
remained about the same, a similar degree of agreement between fitted and 
observed values, i.e., a similar coefficient of linear consistency, would be 
expected. 

However, if all the handwriting specimens from 10 to 90 in the Ayres 
Scale were used, one would expect a larger o? , and if, as seems plausible, 
a; remained about the same, the result would be a higher coefficient than that 
found here using only values 50, 70, and 80. On the other hand, if one used 











ps - 


































(tog) QL°nt 0260" L1L9° Tz (a) 
(To000* >) 9L*92T 9220" 9¢s9° g2 (1) Teep rice 
(T000" >) GL°GL TLz0° GLGL* gz (d) 
(T000" >) 64 tL 9920" 6th gz (9) 
Foy = & a q, 
e 
(000° >) ¢1g9°2 L (a) ; 
(T0000°>) 109° 22 9 (1) aTeos se0uTT 
« os ot 199g °0z g (d) 
ry (10000°> 9095 °S2 8 (9) 
4 : 
& 
T-4% 
a Toy = L q Tg 
= 2 7S 
°o 
a (T000° >) och G6TT’ BONE" gz (a) 
. (10000°>) gt Hela SLc9° 6019°¢2 (2) 1 
Aa (T0000°*>) 40° hl. T2 6¢09° Onl." Tz (d) 
(T0000°>) Lt*LLoe Leénl* LTLL*92 (9) 
ss —— 
x L S 
d arenbs Tyo azenbs sermbs (3) wopaary *#8}Ep jo UOT PBTIBA 
areTnsuy uBeyW go ums Jo saarZaq aommo0g jo aa.mog 





aouByieA Jo stskTeuy 








eyed Buttes jo uostawduog 


G TIaVL 


S 





| 
oO 
em 


HAROLD GULLIKSEN AND JOHN W. TUKEY 


Zz 
Wory eyep TTeqeseg = (d) ‘002 = N ‘6 = ¥ ‘20430303 OM aSeuL = (L) ‘OOT = N ‘6 = % ‘eQEp uoyaoUTAg = (qd) ‘OOT = N 





‘22 = N ‘Q = 4 S[ET] 2ETTOISON 
« 


6 = 4 fBzEp OBBoTYD = (9). 




















2619" ¢66L" eee’ cael" (a) 
(696° = ‘2) (966° = @2) 
266" €¢ 16° 9696" GLH6" (1) 
4<96" 2696" 1666" g6r6" (a) 
9996" cele" 296° e976" (9) 
te al —_ bey TT ° x8}Ep JO asomog 
SOTIITIQRTTION poqeuT ys” 
6Ct0° G¢To"- 1¢+0° 6£40° (a) 
0600° 9LTO° HLTS* GTTS” (1) 
OOTO" TLTO* ngge" 8982" (a) 
OOTO’ 99T0° GQs¢’ Tess’ (9) 
pee bal 
aqa-a ae % 
do *aOTYBIIBA TeTWOUT, Po ‘Tapow sTeos worry suoTyeyAog 3 ‘SONnTBA aTBOS IBeUTT #878p JO aomog F 





syueuodmog souBTIeA PeyeWT 4ST 








108 PSYCHOMETRIKA 


only specimens 50, 60, and 70, a slightly smaller o7 and (if oj; remained about 
the same) a slightly lower reliability would be expected. 

It can be seen that even though Mosteller’s chi square goodness of fit 
test, x; , shows clearly that the handwriting data deviates significantly from 
a linear scale, nevertheless the scales show a satisfactory agreement with 
the linear model, about .95 for the case where the nine handwriting specimens 
were rated by 100 or 200 judges. Since only 207 is considered to be true 
variance, the coefficients given by r, and r,, will be what are usually termed 
“conservative” estimates. A “dashing’”’ estimate for reliability is obtained 
by regarding oj as part of the true variance rather than as part of the error 
variance. Thus we have 


a 20° he a; = N 
C pth s i 


This definition yields for the handwriting data reliabilities of .98 or .99. 
This coefficient represents the correlation between two sets of @ values for 
the same stimuli judged by another random sample of people from the same 
population. Coefficients computed from this formula are appropriate to the 
recomparison of a randomly selected pair of the nine specimens against a 
background of seven other handwriting specimens drawn from a population 
having the same o? and also the same peculiarities that produced the devia- 
tions from linearity. One possibility is a recomparison of a random pair 
against a background of the same seven other handwriting specimens. Thus 
we see that without any assumptions about the law of comparative judgment 
one has a set of stimuli that cannot be regarded as indifferent to the subjects. 

A corresponding chi square is given by x7 = NS,r with (k/2) (k — 1) 
degrees of freedom. These values of chi square (x7 in Table 5) are all extremely 
large, indicating a negligible probability that the data could have arisen by 
random sampling from a population in which the proportions were all .5. 

The coefficient r,, which is zero if the percentages of Table 1 are all 
random binomial deviations from .5, may be compared with Kendall’s 
coefficient of agreement ([10], pp. 125ff.; [11], pp. 333ff.), which is unity only 
if all proportions are 1.0 or 0.0, i.e., if there is complete agreement among all 
judges in making each judgment. Kendall’s coefficient of agreement is de- 
termined directly from the experimental frequencies, without using any 
transforms such as the are sine. The data here presented cannot be regarded 
as showing such agreement among all judges. However, it clearly cannot 
be regarded as indicating only random judgments. 

We may compare these coefficients computable for a single set of data 
with more conventional reliabilities obtained by comparing the Princeton 
with the Chicago scale values. The correlation between the two sets of values 
in Table 2, 7,, is .989, which, it may be noted, is similar in magnitude to 7, . 











HAROLD GULLIKSEN AND JOHN W. TUKEY 109 


If we make no allowance for changes in discriminal dispersion, but take the 
entire difference of scale values adjusted to a common mean but not to a 
common variance as error, then 


2 
ni Sag. cas, 
2£ +. 3¥ 
which is similar in magnitude to the estimates of p, . 

Two coefficients have been suggested. The coefficient r, indicates the 
extent to which the stimuli are differentiated by the subjects. It seems reason- 
able to regard r, or r,, aS a conservative estimate of consistency for a single 
set of data scaled by the law of comparative judgment. In such a case there 
would be no replication to indicate that oj might, from some points of view, 
reasonably be regarded as part of the true variance. The estimates r, and r,, 
give a direct measure of the agreement between the observed @ and fitted 
6* values of the aresin Vp. 

The lines labelled B in Table 5 give for comparison the data on baseball 
teams reported by Mosteller [13]. It is interesting to note that despite the 
nonsignificant chi square, the reliability, r, or r,, , is only .73, while r, = .62. 
This low reliability is due apparently to the similarity of the different teams, 
since est o? is only .0439, which is less than the binomial variation of .0455 
with which o? must be combined. Under these circumstances it is not sur- 
prising that chi square is not significant, especially with N as low as 22. On 
the other hand, the data on handwriting has a smaller binomial variance 
(.01), and a much larger «7 (about .3). Despite the fact that the residual 
mean square D is slightly smaller than that for the baseball data, when NV 
equals 100 or 200 with 28 degrees of freedom, this much smaller discrepancy 
cannot be regarded as due to chance. 

In summary, a variance-components analysis has been presented for 
paired comparisons. This analysis gives estimates of the variance of the 
actual scale values, a; , and the variance of observations due to deviations 
of the data from the linear paired comparisons model, «7 , which are compared 
with the binomial sampling variance o} . A variety of coefficients based on 
these three variances is also presented. If one is interested in asking whether 
or not the subjects’ responses are purely random, then Kendall’s coefficient 
of agreement, or the r, as presented here may be used. If one is interested 
in the extent to which the law of comparative judgment accounts for the 
data, then r, or r,, would be the appropriate coefficient. 





REFERENCES 


{1] Ayres, L. P. A scale for measuring the quality of handwriting of school children. New 
York: Russell Sage Foundation, 1912. 

[2] Bennett, C. A. and Franklin, N. L. Statistical analysis in chemistry and the chemical 
industry. New York: Wiley, 1954. 





110 PSYCHOMETRIKA 


{3] Cochran, W. G. The x? test of goodness of fit. Ann. math. Statist., 1952, 23, 315-345. 
[4] Davies, O. L. (Ed.) Design and analysis of industrial experiments. New York: Hafner, 
1954. 
(5) Duncan, A. J. Quality control and industrial statistics. Chicago: Richard D. Irwin, 1952. 
[6] Fisher, R. A. Statistical methods for research workers. (10th ed.) London: Oliver and 
Boyd, 1946. 
[7] Fisher, R. A. and Yates, F. Statistical tables for biological, agricultural and medical 
research. New York: Hafner, 1953. 
[8] Freeman, M. F. and Tukey, J. W. Transformations related to the angular and the square 
root. Ann. math. Statist., 1950, 21, 607-611. 
(9] Hald, A. Statistical tables and formulas. New York: Wiley, 1952. 
[10] Kendall, M. G. Rank correlation methods. London: Griffin, 1948. 
[11] Kendall, M. G. and Babington-Smith, B. On the method of paired comparisons. 
- Biometrika, 1940, 31, 324-345. 
[12] Mood, A. M. Introduction to the theory of statistics. New York: McGraw-Hill, 1950. 
[13] Mosteller, F. Remarks on the method of paired comparisons. III. A test of significance 
for paired comparisons when equal standard deviations and equal correlations are 
assumed. Psychometrika, 1951, 16, 207-218. 
[14] Thurstone, L. L. Psychophysical analysis. Amer. J. Psychol., 1927, 38, 368-389. 
[15] Thurstone, L. L. A law of comparative judgment. Psychol. Rev., 1927, 34, 273-286. 
{16] Tippett, L. H. C. The methods of statistics. (4th ed.) New York: Wiley, 1952. 


Manuscript received 7/1/57 
Revised manuscript received 9/25/57 











PSYCHOMETRIKA—VOL, 23, NO, 2 
JUNE, 1958 


AN INTER-BATTERY METHOD OF FACTOR ANALYSIS* 


LEDYARD R TuckER 


EDUCATIONAL TESTING SERVICE 
AND 
PRINCETON UNIVERSITY 


The inter-battery method of factor analysis was devised to provide 
information relevant to the stability of factors over different selections of 
tests. Two batteries of tests, postulated to depend on the same common factors, 
but not parallel tests, are given to one sample of individuals. Factors are 
determined from the correlation of the tests in one battery with the tests in 
the other battery. These factors are only those that are common to the two 
batteries. No communality estimates are required. A statistical test is pro- 
vided for judging the minimum number of factors involved. Rotation of axes 
is carried out independently for the two batteries. A final step provides the 
correlation between factors determined by scores on the tests in the two 
batteries. The correlations between corresponding factors are taken as factor 
reliability coefficients. 


Factor analysis has been used for a number of years in the explorations 
for basic mental traits. Results from the numerous studies have indicated 
the existence of a number of these traits with several traits being relatively 
firmly established [11]. Sets of reference tests to represent 16 such abilities 
have been prepared by special committees of psychologists [12, 13]. A universal 
index for psychological factors has been proposed by Cattell [7]. There have 
been a number of criticisms of the factorial methods, however, on the grounds 
of unknown stability of results. Serious questions have been raised concerning 
the justification of factor analysis results and propriety of use of the method. 
McNemar [21], for example, reported an empirical study of factorial stability 
in which he concluded that the first centroid factor loadings had standard 
errors approximating those for correlation coefficients but that the second 
and succeeding centroid factor loadings had much larger standard errors. 
He has criticized factor analysts for analyzing and interpreting results much 
beyond any point justified by their data [23]. A need exists for a more defini- 
tive method of factor analysis which yields coefficients indicative of the 
appropriateness of accepting the obtained results. 

Theoretical developments on stability of factor analysis results have 

*This research was jointly weperes by Princeton University and the Office of Naval 

ch under contract N6onr-270-20 and the National Science Foundation under grant 

NSF G-642; Harold Gulliksen, principal investigator. The preparation of this paper and 
the ee material has been aided by the Educational Testing Service. The author 


is grateful to essors Harold Gulliksen and Samuel 8. Wilks for their many most helpful 
comments and suggestions. 


111 








112 PSYCHOMETRIKA 


proven to be very difficult due to the complexity of the problem. Even so, 
some progress has been made. In addition to the study by McNemar, several 
improved methods and sampling error formulas have been developed by 
Bartlett [2, 3, 4], Burt [5, 6], Lawley [17, 18, 19, 20], and Rao [24]. Other 
related work has been published by Emmett [9], Henrysson [14], Hoel [15], 
McNemar [22], Rippe [25], Wold [28], and Young [29]. The procedures 
indicated are so complex, however, that the computations for usual-sized 
factor analyses are feasible only with the use of large electronic computers. 
These developments have attacked only one of the problems of stability of 
results, the statistical significance of the results when compared with possible 
chance results from unrelated measures for a random sample of individuals. 
This is, indeed, an important problem to the psychologist, but it might be 
classified as an operational problem which might be overcome by repetitive 
studies on several samples of individuals. The samples might be made large 
enough to support the results obtained. 

A second and more important problem to the psychologist is the stability 
of factors when there are changes in the battery of tests analyzed. How well 
can factors be identified between two analyses when different tests are used? 
Do the factors transcend a particular battery? Thurstone ({26], p. 360) states 
the problem as follows: ‘‘The problem of factorial invariance should be 
analyzed with regard to the central purpose of factor analysis, the object of 
which is to discover a set of significant and meaningful parameters for de- 
scribing a domain.” He further states ({26], p. 361), “It is a fundamental 
criterion of a valid method of isolating primary abilities that the weights of 
the primary abilities for a test must remain invariant when it is moved from 
one battery to another which involves the same common factors.’’ We wish 
to add a criterion that the scores for individuals on a factor should remain 
invariant as the individuals are tested with different batteries which involve the 
factor. Surely these are the important propositions to psychologists if the 
factors are to be considered as basic traits. 

In order to quantify the problem of factor stability over changes in a 
test battery, a factor reliability coefficient is proposed. Consider that two 
distinct batteries of tests are administered to one sample of people. Let these 
batteries be designated as battery 1 and battery 2. 

The matrices of intercorrelations for each of these batteries may be 
computed and designated R,, and R,, . In addition a matrix of correlations 
between tests in battery 1 and tests in battery 2 may be obtained and desig- 
nated F,, (or its transpose R,,). The complete matrix of intercorrelations 
for all tests given is represented as a supermatrix 


(1) R ars in | 
21 Ry» 


By employing the matrix R,, , the correlation may be determined between 








LEDYARD R TUCKER 113 


any factor obtained from battery 1 and any factor obtained from battery 2. 
Let p, represent a factor obtained from battery 1 and gq, represent a factor 
obtained from battery 2. Suppose that one of the factors g. is matched with 
factor p, and that this matched gq, is designated p, . The correlation between 
p, and p, can be interpreted as a factor reliability coefficient for factor p. 
A high value of this coefficient would indicate high factor stability from 
battery 1 to battery 2. A low value of this coefficient would indicate little 
correspondence between the factors for the two batteries. Thus, the factor 
reliability coefficients would yield a quantitative answer to the problem of 
factorial stability associated with changes in the test batteries. 

The inter-battery method of factor analysis to be described here depends 
on a finding that factor matrices on reference factors can be determined for 
the two batteries from just the matrix of correlations R,. between the two 
batteries. It is to be noted that only the factors common to the two bat- 
teries are obtained and not factors that are represented in only one of the 
two batteries. The intercorrelations for each battery are used only in obtaining 
factor variances for a test of statistical significance for the factors determined 
and for the factor reliability coefficients. This procedure has certain simi- 
larities to Hotelling’s most predictable criterion [16] and Bartlett’s external 
factor analysis [1]. One feature resulting from use of only the matrix R,, is 
that communalities are not involved in the factoring. 

An example of the inter-battery method of factor analysis given here 
uses data published by L. L. Thurstone and T. G. Thurstone [27]. Among 
the tests included in this study were a number intended to depend upon the 
word fluency factor and several more intended for the verbal factor. As shown 
in Table 1, nine tests have been distributed into two batteries. Battery 1 
includes two word fluency tests and two verbal tests. Battery 2 includes 
three word fluency tests and two verbal tests. Although this selection and 
distribution of tests might seem to have depended upon the analysis by the 
Thurstones of the data to be used in our example, pretend for present purposes 
that the decisions on selection and distribution of tests to batteries had 
preceded the analysis. This does not seem too unreasonable since both of 
these factors had been isolated previously and the tests were included by 
the Thurstones for these factors. The tests have been assigned arbitrarily 
to the two batteries. 

The intercorrelations of the tests are given in Table 2. R,, contains the 
intercorrelations for battery 1. ,. contains the intercorrelations for battery 
2. These two tables will be used in only the statistical tests for the factors 
and the computation of the coefficients of factor reliability. The off-diagonal 
matrices R,, and R,, will be used in the determination of the factor matrices. 
It is to be noted that these two matrices are the transposes of each other and 
contain the correlations between the two batteries. The highest correlations, 
.6 to .7, in Ry, are for variables 45 and 46 of battery 1 with variables 10 and 








114 








PSYCHOMETRIKA 


TABLE 1 


Tests Selected for the Example of 
Inter-Battery Method of, Factor Analysis 
































Battery 1 Battery 2 
Major 
Factor Test Test 
ty Name No. Name 
42 Prefixes 23 = First and Last Letters 
Word 
Fluency 54 Suffixes 24 First Letters 
27 ~=Four-letter Words 
45 Chicago Reading Test: 10 Completion 
Verbal Vocabulary 
46 Chicago Reading Test: 51 Same or Opposite 
Sentences 
TABLE 2 
Intercorrelations of the Tests 
Battery 1 Battery 2 
ko 54 45 46 23 ok 27 10 51 
4e 1.000 .554 .227 .189 461 =.506S wK08—S «280 Sw KL 
oa 
P54 .554 1.000 .296 .219 tu79 °° «S530 S8eS ath 3 : 
o R & 
3s 227.296 1.000 .769 il 237. .243.—«w30h-=Ss«w718.—|«w 730s 
46 .189 .219 .769 1.000 212 ©.226« 6291 )=S 668 Sw GL 
23 -46L .479 .237 ~~ «212 2.000 .520 .51h .313 .2h5 
2h -506 .530 .243 .226 -520 1.000 .473 .348 .290 
P27 +408 = 6425 30K 2S «291 Ry, -514 £473, «-21.000 Ss. 37K = 306 ™ 
#210 260 .311 .718 = .68 +313. («348 = «37% =21.000 +672 
51 <: Mees SM cs ae -245  .290 .306 .672 1.000 














LEDYARD R TUCKER 115 


51 of battery 2. Correlations of .4 to .5 occur for variables 42 and 54 of battery 
1 with variables 23, 24, and 27 of battery 2. These two groups of higher 
correlations correspond to the postulated verbal and word fluency factors. 
This is exactly what should have been expected because the possession of 
common factors should raise the correlations in such a pattern. 


Determination of Inter-Battery Factors 
The fundamental factor theorem given by Thurstone [26] in matrix 
form is 
(2) R = FF’, 


where the factors are uncorrelated. For our case of two batteries of tests, 
the factor matrix F may be considered as a supermatrix 


3) die 8 G, 4 
i 0 es 


where A, is a matrix for battery 1 and A, is a matrix for battery 2 for factors 
common to the two batteries. G, contains factors appearing only in battery 
1 and G, contains factors appearing only in battery 2. Substitution of (1) 


and (3) into (2) yields 


Aj Aj 
(4) 11 ci a ee G, 7 Gi 0 
21 Roe A, 0 G2 0 G3 


from which is obtained 
(5) Ri. = A,A} ° 


A least squares fit to R,, may be obtained for any rank wu (number of 
factors) of A, and A, from a development by Eckart and Young [8] for the 
approximation of one matrix by another of lower rank. The application of 
the Eckart and Young development to the present case is sketched as follows. 

Let there be n, tests in battery 1 and.n, tests in battery 2. Further, 
let the tests in battery 1 be designated j (or J) and the tests in battery 2 be 
designated k (or K). Define a matrix H, with entries h;; by 


(6.1) HA, = RR}, ’ 
and 
(7.1) hy = Dorit . 


Note that H, is symmetric and that the diagonal entries are the sums of 
squares of the correlations in rows of Ris. 











116 PSYCHOMETRIKA 


(8.1) hy; = DL rie . 
k=1 


Then, the sum of the diagonal entries of H, is the sum of squares of correlations 
in Ry» . 


(9.1) 2 h;; = e Dit : 

Consider the characteristic roots and vectors of H, , y; being the root 
f, and W,, the corresponding unit vector. Entries in W,, are w;,. Then, from 
properties of characteristic roots and vectors (see, for example, [26], pp. 
500-503), 


(10.1) , HiWry;’ = Wr ; 

and 

(11.1) = = h;; . 
f=1 ji=1 


Substitution from (9.1) yields 


nmi 


(12) 21; =-> 


7=1 k= 


neo 


Tins 
1 
For convenience in subsequent discussion, let the roots y} be arranged in 
descending order 
(13) HEUHNA~AWR2S°° SVS °° 2 a> 
Substitution from (6.1) into (10.1) yields 
(14.1) R»RLWyry; = Wr . 
Define a vector W,, with entries w,, , 
(15.1) RLWryy = Wr. 
Substitution from (15.1) into (14.1) yields 
(15.2) RW; = Wr. 


Substitution of the value of W,, in (15.2) into the left member of (15.1) 
yields 
(14.2) RiRi2W s2y;* = Wr. 


Note that (14.2) is similar in form to (14.1) but involves W,, instead of W,, . 
Define 


(6.2) H, = RiR,, , 











LEDYARD R TUCKER 117 


(7.2) lux = Dorivix - 

i=1 
The similarity of these equations to (6.1) and (7.1) is to be noted. For H, , 
scalar products of row vectors of R,, are obtained, while for H, , scalar 
products of column vectors of Rj, are obtained. Substitution of (6.2) into 


(14.2) yields 
(10.2) HW; = Wr, 


which parallels equation (10.1). As a consequence, 7; is a characteristic root 
of H,, and W,, is the corresponding characteristic vector. It can be demon- 
strated that W,, is a unit vector. 


Define 
(16) Tuik = Vik — > WisV Wer 5 
=1 
(17) Ruz = Ry. — Ze WrvsWe , 


f=1 
where the first u roots 7; , and vectors W,, and W,, are employed. Since the 
roots were ordered in magnitude as in (13), these first u roots are the u largest 
roots. As in (6.1) and (7.1) let 


(18.1) Ay = Ru2Riz , 
and 
(19.1) huis = D Petts. Jb . 


k=1 


Again the sum of the diagonal entries in H,, is the sum of squares of the 
entries in Ry. , 


(20.1) }> hii = > ae : 


i=l k=1 


It can be demonstrated that 


(21.1) > hai: es > hj; = 27 . 


Thus, from (9.1), (20.1), and (21.1) 


(22) Y de = Dek Da. 


i=1 k= i=l k=1 f=1 


Equation (22) provides a means for determining the closeness of approxi- 
mation from the original correlations and the roots extracted. It is not neces- 
sary to determine the smaller roots nor to compute the matrix Ry, . 








118 PSYCHOMETRIKA 


In order to obtain the factor matrices A, and A, from the foregoing 
characteristic vectors let 
(23.1) An = Wn?’ , 
(23.2) Ap = Wray” . 


A,, is column f of A, , and A,, is column f of A, . Let a;, be the entry in row 
j of A,, , and a,, be the entry in row k of A, . It is to be noted that A, and 
A, will each have u columns. Equations (23.1) and (24.1) then can be written 


1/2 


(24.1) iy = Wilt 

(24.2) Any = Wy” . 
Substitution from (24.1) and (24.2) into (16) yields 
(25) nn TS > Qj Anz - 


This equation may be written in matrix form as 
(26) Ruz = Ry, — AiA2. 


A comparison of (26) with (5) indicates that R,,,. is the matrix of errors 
in approximating R,. by A,A} . Equation (22) gives the sum of squares of 
these errors. By selecting the u largest roots, 7 , this sum of squares is made 
a minimum for a given number, u, of factors. Computations could be ac- 
complished as outlined below. 


1. Select the battery with the smaller number of variables to be battery 1. 
Thus 


(27) MN = Ne. 
2. Compute the matrix H, from (6.1), 
(6.1) Hy, = Ri Ri, . 


3. Determine the characteristic roots, y; , and vectors, W,, , of H, . 
This step might be accomplished by one of the iterative methods such as 
Hotelling’s method for determination of principal axes or components. 
(For a description see [26], pp. 483-484.) In this case H, is treated like a 
correlation matrix or a covariance matrix (but the diagonals are not ad- 
justed for communalities). When a principal axis is found, it is adjusted to 
a unit vector by dividing each loading by the square root of the characteristic 
root. (Note that “latent root’ and “characteristic root’? are synonymous.) 
An alternative is to have these roots and vectors computed on an electronic 
computer. 

4. Arrange the characteristic roots in descending order and determine 
from (22) the sum of squares of residual correlations, 7, ;, , 











LEDYARD R TUCKER 119 


(22). > Sn = - ae are 


i=1 k=1 i=l k=1 


for successive removal of factors f. That is, 


bm >» Tih 

i=l k=1 
is to be determined successively for wu of 1, of 2, of 3, etc. When this sum of 
squares of residuals is as small as desired, the preceding factors are employed. 
If the characteristic roots and vectors are determined in step 3 by a method 
(such as Hotelling’s) which yields the roots and vectors one at a time in 
descending order of the roots, each successive root may be tested in this 
step before the next root and vector is determined. 

5. When the characteristic vectors W,, are determined, the vectors 

Wz are computed from (15.1) 


(15.1) RiWry; = Wr. 


6. A check on the determination of the characteristic vectors, W,, and 
Wy. , and the roots 7 is provided by (15.2). 


(15.2) RyaWry;) = Wr. 


A further check should be made that both W,, and W,. are unit vectors, 
that is, 


(28.1) > Wiis 


i=l 


1, 


(28.2) > wi, = 1. 
k=1 
7. Compute the columns of the factor matrices A, and A, by (23.1) 
and (23.2). 


(23.1) An = Wr”, 


(23.2) An = Wry”. 


Computation of factor loadings for our example is given in Table 3. 
The matrix H, of (6.1) was obtained by computing the sums of squares of 
rows and sums of cross products between rows of the off-diagonal correlation 
matrix R,». of Table 2. A similar matrix H, could have been computed from 
the sums of squares of the columns and sums of products between columns 
of R,. , but it would have had five rows and columns. Only one of these 
matrices is required. The characteristic vectors of H, were obtained, the 
first two being listed as columns of the matrix W, . The corresponding roots 
are given in the row 7’ near the bottom of the table. The sum of squares of 








120 PSYCHOMETRIKA 


the correlations in R,, was 3.9934. After one factor the sum of squares of the 
residual correlations, 1,,, ;, , Was .4570 as given in the bottom row of Table 3. 
After two factors, the sum of squares of the residual correlations was down 
to .0008. Since this indicated quite small errors of approximation, and since 
the subsequent statistical test indicated that no more factors were justified 
by the data of our example, the two factors were considered to be sufficient. 
The matrix W, was found by (15.1). 

When the vectors of W,, and the values of y;' were substituted into 
(15.2), the vectors W,, were obtained. This provided a check on the determi- 








































1 


TABLE 3 


Determination of Inter-Battery Reference Factors 





W) } 
Test H, = RR! A | I 
No. ca Uae by (Characteristic 2° ily 
vectors of H,) 











Se ae ee a oe x aa 
42 -7715 «8244 .7332 .6808 +4203 .5669 576 466 
54 -8244 .8844 .8218 .7624 +4610 5391 -632 443 
45 +7332 «6.8218 1.2561 1.1651. +5729 -.4569 1786 -.376 
46 -6808 .7624 1.1651 1.0814 ©5316 -.4234 729 =. 348 
Test ' Wn, = R! el 
a RigMy 2* Ri¥4 [7] : os oll] 
23 -6631 .3215 -3526 «4760 484.391 
24 -7164 .3659 -3809 .5417 +522 4k 
27 -6963 .1983 -3703 .2936 508 .2h) 
10 1.0344 -.2900 +5501 -.4293 754 = .353 
51 1.0143 -.3091 ~5394 -.4577 +740 -.376 








Characteristic roots of H 





1 
a II 
y* 3.5364 «4562 » ee 
E big e 3, by 
y 1.8805 .6754 j=l ker “a 


7 1.3713 .8218 


«4570 .0008 




















LEDYARD R TUCKER 121 


nation of the characteristic roots and vectors. That the columns of W, and 
W, were unit vectors was checked by verifying that the sums of squares of 
the entries in these columns were unity (within rounding error). Each column 
of both of these W matrices was multiplied by the corresponding 7? as is 
indicated in (23.1) and (23.2). This produces common factor matrices A, 
and A, . These matrices reproduce the given matrix R,,. in a least square 
sense by equation (5). 

It is to be noted that the factor matrices A, and A, may be subjected 
to transformations of axes. The rotational problem of the regular methods 
still exists. More material on this point will be given in a later section of 
this paper. 


Maximum Inter-Battery Covariances 


A second interpretation for the W matrices is of interest. Each column 
vector, or factor, in these matrices gives a set of weights that may be applied 
to the observed scores for the corresponding batteries to produce composite 
scores. Consider the standard score matrices S, and S, for the two batteries. 
Tests are row vectors and individuals are column vectors. Entries in S, are 
s;; , and entires in S, are s,; . Let 


(29.1) fas = 2 Wi s8ii » 
7=1 

(29.2) Lp = > WesSki 
=1 


where x,,; are scores on a weighted composite of scores in battery 1, and 
Ly; are Corresponding scores on a weighted composite of scores in battery 2. 
The weights, w;, and w,, , are entries in column f of W, and of W, , 
respectively. The covariance between x,,; and 2,2; , designated by Cy12 , is 


1 
(30) Che = N bm Lpriksoi y 


where N is the number of individuals in the sample. The weights will be 
chosen to maximize this covariance. 

Instead of defining the weights in such a way that the variances of zy; 
and 279; are some given constants, the weight vectors W,, and W,. will be 
defined here as unit vectors. The former definition leads to the canonical 
correlation of Hotelling’s most predictable criterion [16]. Definition of var- 
lances involves the intercorrelation matrices R,, and R.. and, thus, the 
variances or communalities of the tests. In order to avoid the communality 
problem, the weight vectors may be internally limited to unit vectors. Some 
limit must be placed on the weights, for otherwise c;,;. could be made increas- 
ingly large by use of increasingly large weights. This restriction is expressed in 
(28.1) and (28.2). 








122 PSYCHOMETRIKA 


The solution for maximum covariance, C,,2 , under the restrictions of unit 
weight vectors involves the matrix H, of equation (6.1) [or matrix H, of 
(6.2)] and its roots and vectors of (10.1). The weight vector W,. is determined 
as in (15.1). There are as many orthogonal solutions as there are tests in the 
smaller battery. It turns out that 


(31) Cne = Vr; 


thus, the maximum ¢;y,. is obtained by choosing the largest root. As many 
pairs of weight vectors and resulting composite scores may be taken as there 
are significant roots y, . An interesting property of this solution is that 


(32) x > Lpiilors = O, (f ¥ 9g). 


Thus, each composite from one battery is uncorrelated with all composites, 
except the corresponding one, from the other battery. Each successive pair 
of weight vectors, W,, and W,, , for successively smaller roots y, , involves 
the maximum inter-battery covariance remaining after, and independent of, 
the covariances for the preceding pairs of weight vectors. 


Test of Significance for Factors Determined 


An approximate significance test for the minimum number of factors 
determinable from any given large sample data has been developed for the 
inter-battery method. While this test seems reasonable on intuitive, logical, 
and geometrical grounds, a completely rigorous mathematical development 
of this test, or any variant of it, has not been found. The grounds for the 
reasonableness of the proposed statistical test will be discussed in the sub- 
sequent section. The nature of this test will be presented in this section. 

Table 4 gives the calculations of the proposed statistical test when 
applied to the illustrative problem. Three constants of the problem are given 
in the line immediately under the heading. These are: N, the number of 
people on which the correlations are based; n, , the number of tests in battery 
1; and n, , the number of tests in battery 2. In the body of the table 
there is a column for the original data and one column for data after the 
determination of each factor. Row 1 gives the headings for these columns in 
terms of the factors already determined, f and wu being used as subscripts to 
designate factor number. Rows 2 and 3 give the number of tests in the two 
batteries decreased by the number of factors already determined as indicated 
by the column headings. Row 4 gives the products of the number of tests 
decreased by the number of factors. These values are interpreted as the 
number of degrees of freedom for a x’ distribution. 

Rows 5 and 6 repeat information from Table 3. Row 5 contains the 
roots 7; for the factors. Row 6 contains the sums of squares of the correlations 
in R,, and the sums of squares of the residual correlations after each factor. 








LEDYARD R TUCKER 


TABLE 4 


Statistical Significance Tests for Inter-Battery Reference Factors 








N = 710 n= 4 ny = 5 
Lt -8 0 1 2 
2. (a, - u) 4 3 
3. (n, - u) 5 4 3 
4. (n, - u)(n, -u) = (a.f.) 20 12 6 
2 
5+ 1s 3.5364 «4562 
u 
6. £5 ro i 7 = (A) 3.9934 .4570 ~—«-.0008 
x tal 
2 
t 2.1406 1.1820 
u 
2 
8. n= Sn 4.0000 1.8594 677% 
2 : 
9. B55 2.5604 1.1095 
u 
2 . 
10. n, - 2 Spo 5.0000 2.4396 1.3301 
. 2 2 
5% Oe (n, - 2 Ser )(ng - E Sto) = (B) 20.0000 4.5362 -9010 
N(a.f.)(A 
12. 5, = - 2835. 858. 3.8 
13. x(a.f. = (n, - u)(n, - u); p = 02) 38. 26. 16.8 
Wa. op Us <a oT 





123 


Rows 7 and 9 contain the variances of the scores 2,,; and 2,2; of (29) on 
the factors for the two batteries. These variances depend on the character- 
istic vectors W,, and W,. and may be obtained from the following matrix 
equations: 


(33.1) 
(33.2) 


2 
Sh = 


WirkuWy , 


2 = W f2Ro2W se . 











124 PSYCHOMETRIKA 


It is noted that the intercorrelation matrices R,, and R,, for the two batteries 
are employed. Unity is used in the diagonal cells of these matrices and not 
the communalities. Rows 8 and 10 contain the total test variances in each 
battery decreased by the factor score variances for the factors already de- 
termined. Row 11 contains the products of the entries in rows 8 and 10. 

A coefficient , appears in row 12 and is computed from N, and entries 
in rows 4, 6, and 11. ®, is given by 


nmi 


Nin. —- Wr -—w DO Yen 


j=1 k=1 


(34) acs (n, S. sis )(m -> s,) 


f=1 f=2 





In case: 


i. the first u factors are well determined (have large roots), 

iz. the population correlations depend only on the first u factors, 
277. the population density function is multinormal, 

w. the sample used for the analysis is large and random, 


then ©, is approximately distributed as x’ with (n, — u)(n. — u) degrees of 
freedom. The x’ with these degrees of freedom for a p of .01 is given in line 13. 
In line 14, the value of p is listed corresponding to ®, . The results for our 
example indicate clearly that the first two factors are justified but that the 
second factor residuals are so small as not to justify any further factors. 

In the case of each table of correlations or residual correlations a hypo- 
thesis is made that the coefficients deviate from zero due only to sampling 
of individuals. That is, the hypothesis is made that there are no further 
factors. Acceptance of such a hypothesis does not indicate, however, that 
there are no factors involved for the population. Such factors might be found 
with more extensive data. The statistical test must be understood, therefore, 
as a minimum test which helps support the idea of existence of factors, but does 
not necessarily negate an idea of further factors. 


Justification of Statistical Test 


The statistical test involves a chi square approximation to the theoretical 
distribution for correlation coefficients. Fisher [10] found, for a zero population 
correlation and a normal bivariate distribution of variables, the distribution 
of sample coefficients to be 


(35) f(r) dr =k, — r?)?*™ ar. 
Let 
(36) @ = Nr’. 








LEDYARD R TUCKER 125 


Then 
(37) dr = 34¢6°'?N-"”? do. 
Substitution of (36) and (37) into (35) yields 

@ \hr-9 
(38) 16) dg = {1 —$) apn? ap, 
Note that as N > o~, 

4(N-4) 
(39) (1 - $,) > e}, 
Define 
(40) k, = 4k,N7~”’. 
Substitution of (39) and (40) into (38) yields 
(41) f($) dp = ke”? de. 


Equation (41) is the distribution for x” with one degree of freedom. An 
empirical comparison of the results of (35) and (41) indicates that, for N’s 
of 100 or more, values of r for p = .01 as determined from the two distri- 
butions differ by no more than .001. 

In order to apply the foregoing frequency distribution to an entire 
correlation matrix of the form of R,, it is necessary to inquire into the possible 
statistical dependence of the several coefficients of correlation involving 
common variables. In case these coefficients were independent, the repro- 
ductive property of the chi square distribution would permit the summation 
of the several ¢’s, which would yield a new coefficient distributed as chi 
square with as many degrees of freedom as the number of elements in the 
sum. For the case that all population coefficients of correlation (in matrices 
R,, and R,»2 as well as in matrix R,,2) are zero and large samples are employed, 
the sample coefficients of correlation in the matrix R,, seem adequately 
independent to warrant use of the foregoing sum of the ¢’s as presented in 
the following paragraphs. 

Consider, first, the case of one variable, j, in battery 1 and two variables, 
k equal to a and b, in battery 2. Let N scores on 7 be drawn at random from 
a normally distributed population of scores. Let a large number of samples 
of NV scores for each of variables a and b be drawn randomly and independently 
from this same population, the drawings for each variable a and b being 
independent, also. For each sample, the coefficients of correlation r;, and 
r;,» may be computed and the corresponding coefficients ¢;, and ¢;, may be 
determined. The distribution for each of ¢;, and ¢;, is given by (41). Since 
scores for variable b are drawn independently of the scores for variable a, 
the correlation of variable b with variable 7 will be independent of the correl- 








126 PSYCHOMETRIKA 


ation of variable a with variable j. For any particular sample when considering 
r;» , it does not matter how correlated variable a was with variable j. It 
follows then that ¢;, and ¢;, are independently distributed, also. It does not 
follow, however, that ¢,, would be independently distributed from ¢;, and ¢;, . 
Any two of these coefficients are independently distributed, but the complete 
set involves some dependence of distribution. By employing only the inter- 
battery correlations in matrix R,. , such complete sets of intercorrelations 
are avoided. Therefore, the entries in any particular row or particular column 
of R,. and the corresponding ¢’s are independently distributed under the 
hypothesis that all population correlations, including the intercorrelations 
for each battery, are zero. 

Consider next the case of two variables, 7 equal to 1 and 2, in battery 1 
and two variables, k equal to a and b, in battery 2. Repeated samplings of N 
scores for each variable, drawn independently for each variable, from a 
normal universe will yield distributions of the coefficients ¢,, , $15 , Poe , $2 - 
We have already shown that the pairs 1. — $15 , $20 — $2 » Pie — $20; %1b — bas 
are independently distributed. The pairs ¢,. — ¢., and ¢,, — ¢2, are also 
independently distributed since the second member of the pair involves 
variables distinct from the first member. Consider any triplet of coefficients 
such as $14 — $15 — $24. Lhe distributions of each of these coefficients is 
independent of the distribution of the other two numbers of the triplet since, 
when any two are fixed, the third has complete freedom of distribution. For 
example, let the scores on variable 1 be identical with the scores on variable 
b and let the scores on variable 2 be identical with the scores on variable a. 
This fixes ¢,, and ¢,, at their maximum possible values. The distribution 
between variable 1 and variable a is not influenced by the preceding identities 
of scores. Thus, ¢,, is independently distributed from ¢,, and ¢., . 

There is some dependence of distribution, however, for the entire set 
of four ¢’s. For large samples, this dependency may not be serious and will 
tend to disappear as the samples increase in size. 

Let & be the sum of the ¢’s for the matrix R,, , 


(42) d= > Tis ’ 


j=1 k=1 


or from (36), 


(43) b=ND den. 
i=1 =] 

The foregoing material indicates that there is no dependence among the 
several ¢’s which affects the first three moments of the distribution of ©, 
that is, the mean, variance, and skewness of #. Higher moments are affected, 
but to a reduced extent with larger samples. It seems appropriate to employ 
the chi square distribution in evaluating #. There will be n,n, degrees of 
freedom. 








LEDYARD R TUCKER 127 


In order to make a statistical test after each factor has been determined 
and before going on to the succeeding factors, it is necessary to make an 
adjustment in the preceding test. The sum of squares of the residual corre- 
lations, ru;, , after u factors may be obtained easily from (22) when the roots 
of the matrix H, are known. But these residual correlations refer to residual 
variables with lowered variances and involve dependencies due to the use 
of some degrees of freedom in the factors already determined. The necessary 
adjustments will be discussed for the general case when u factors have been 
established and it is desired to test the residual matrix R,,, for justification 
of any further factors. 

Let V, be ann, X n, orthogonal matrix containing as its first « columns 
the vectors W,, (f = 1, 2, --- , u). Similarly, let V. be ann, X n, orthogonal 
matrix containing as its first uw columns the vectors W,,. (f = 1, 2, --- , u). 
Let the column vectors of V, be v, and the column vectors of V, bev, . Suppose 
the vectors v, and v, contain weights for producing weighted composites for 
batteries 1 and 2 in a manner similar to use of W,, and Wy, in (29.1) and 
(29.2). Thus, 


pa VjpS85i » 


i=l 


ne 
> VigSki + 


k=1 


(44.1) Lyi 


(44.2) Lai 


The matrix of intercovariances for each battery may be obtained from the 
following matrix equations: 


(45.1) Cop = ViRul, , 
(45.2) Cie = ViR2Ve ’ 


where C,p is the matrix of intercovariances of composites p for battery 1, 
and C,¢ is the matrix of intercovariances of composites q for battery 2. The 
matrix of covariances C,, between composites p and g (the between batteries 
1 and 2 covariances) may be obtained from 


(46) Cy = ViR,2V2 ° 


This matrix will be of the form 


(47) Coe =|" ? 
0 Cre 
(p,a>) 


where y; is a diagonal matrix containing as diagonal entries y, (f = 
1, 2, --- , u). Since V, and V, are orthogonal matrices, 


48) 2p a 2p Le 


p=1 ¢=1 j=l k=1 








128 PSYCHOMETRIKA 


As a consequence of the form of C,, given in (47), and (22) and (48), 


a E £e-EEa- Ee 


(50) ea Zz rik . 

7=1 k=1 
Equation (49) yields an efficient procedure for determining the sum of squares 
of the entries in the C,, (p, g > u) matrix. It is to be noted that this matrix 
is of size (n, — u) X (nm, — u). 

In order to apply the chi square test of (36) and (41) it is necessary to 
obtain the r,, from the c,, . To do this, the corresponding variances are 
required. Since V, is an orthogonal matrix, the trace (sum of diagonal entries) 
of R,, yields the trace of C,p in (45.1). Note that the trace of R,, is n, since 
all diagonal entries in 2, are unity. Thus, 


(51.1) the, pat. : Ye 


From our definition, V, contains the vectors W,, as its first «w columns; the 
corresponding variances S;, are 


(52.1) C» = Sr = ae :» WisWa Tis y (ff=p<u). 


i=1 J=1 


Equations (51.1) and (52.1) yield 


(53.1) yy Cy = 1, — > Sh» (p > u). 

Similarly for battery 2, 

(52.2) Si = > 2 WrfWRK TKK 4 (f <u); 

(63.2) Lita =%—- PSh, (>. 
q=ut+ = 


If the vectors v, (p > u) are chosen so that the c,, equal a constant, then 
from (53.1), 


1 u 
(54.1) Cyp = jo, = (n, oo D> si), (p > u). 
Similarly for battery 2, 


(54.2) Ce = gas (n, _- S,), (q > u). 


Nz — U) 








LEDYARD R TUCKER 129 


The relation between r,, and c,, is 


2 
2 Coe 





(55) = 1 (p,q > u). 
From (36), 

Ce 
(56) bog = N Color 


A total coefficient, @, , may be obtained by rewriting (42) as 


ni 


(57 ee ee ae 


p=ut+l g=ut+l 


Substitution from (49), (54.1), (54.2), and (56) yields (34) and, its equivalent 


oat Nim — wins — w) oj Sh, 
(58) © (. $i : s(n i : ) (> der 2, %) 


In the situation when the population covariances c,, (between batteries), 
¢yp (within battery 1), and c.g (within battery 2) are all zero for p > u, ©, is 
approximately distributed as x? with (n, — u) (n. — u) degrees of freedom. 
This yields a series of statistical tests for a minimum number of factors 
parallel to the statistical test previously discussed for the original correlation 
matrix. Each of these tests yields an indication whether any more factors are 
justified by the given data beyond those factors already determined. 





Rotation of Axes 


As noted earlier, the matrices A, and A, may be submitted to trans- 
formation and (5) will still be true for the transformed matrices. Let M, and 
M, be two u X u, nonsingular transformation matrices. u is now taken as the 
number of common factors. Let 


(59.1) B, = A,M,, 

(59.2) B, = A,M,. 

Substitution of (59.1) and (59.2) into (5) yields 

(60) R,. = B,(M{M,)"'B} . 

In case M} is the inverse of M, , (60) will simplify to the form of (5) with 
_ (61) R,. = BB}. 


Since M, and M, are not restricted to being orthogonal matrices, the trans- 
formation problem has a more general form than occurs for the usual] type of 
factor analysis. A particular special case of interest occurs when M, and M, 








130 PSYCHOMETRIKA 


are diagonal matrices. The effect is to change the entries in the columns of A, 
and A, proportionately. When one column of A, is multiplied by one constant, 
the corresponding column of A, is multiplied by the reciprocal of this constant. 
Thus, the scaling of the columns of factor loadings is not unique. 

Consider a second special case where (M3M,)~* is symmetric and has 
unity in the diagonal cells. If 


(62) R, = (MzM,)", 
(60) becomes 
(63) Ris _ B,R,B; ’ 


which corresponds to the fundamental factor equation for correlated factors 
({26], p. 354). 

The preceding cases require that the matrix (M{M,)~* be symmetrical. 
No satisfactory method has been devised for transformation of the matrices 
which incorporates this restriction. In one sense, however, it is desirable to 
treat the matrices A, and A, separately. Any lack of conformity in these two 
operations will result in lowered factor reliability coefficients. In terms of a 
critical evaluation of factorial stability, equivalence of transformations as 
determined separately for the two batteries constitutes an important aspect. 
Low factor reliability coefficients may reflect, in part, nonequivalent trans- 
formations of the factor matrices. Greater assurance of stable factor determi- 
nation from two batteries results from high factor reliability coefficients. 

A graphical method of rotation of axes was employed for the illustrative 
example in determining the transformation matrices M, and M, . Figure 1 
presents the graphs used for each of the two batteries. Each of the factors in 
the A matrix was taken as an orthogonal axis and points were plotted for 
each variable using the entries in the A matrix as coordinates. Oblique lines 
were drawn through clusters of points and the normals to these lines were 
constructed. The new factors were assigned the code identification of A; 
and B, for battery 1 and A, and B, for battery 2. The factors A, and A, have 
word fluency tests (see Table 1) with nonzero projections on the normals. The 
factors B, and B, have verbal tests with nonzero projections on their normals. 
These factors were, therefore, paired as indicated. The direction cosines of 
the normals were recorded in the M matrices of Table 5. The B matrices of 
Table 5 were computed by (59.1) and (59.2). The entries in the B matrices 
are the projections of the variables on the normals drawn on the graphs of 
Figure 1. The matrix (M{M,)~* is given also in Table 5. It will be noted that 
this matrix does not have unity in the diagonal cells and is not symmetric. 
Some slight differences are indicated in the comparability of rotation of axes 
for the two batteries by the lack of symmetry. 


Factor Reliability Coefficient 


The final phase of the inter-battery method is the determination of the 
factor reliability coefficients. In this context, the entries in the B matrices 








LEDYARD R TUCKER 131 


Battery 1 








Xe A 
4 














Battery .2 


3 
51 $P20 - 


B, J 
\ 27® ax pf* A, 
7 
“ ae 








II 


/ \ 


rs \ 


7 \ 














Figure 1 
Graphs for Rotation of Factors 


will be considered as regression weights for predicting scores on the given 
variables in one battery from composite scores obtained from the variables 
in the other battery. The predictor composite scores are determined from a 
transformation of the W matrices, this transformation depending on the M 
matrices used in transforming the A matrices. Let, in this context, 


(64.1) V, = WT, , 
(64.2) V, = WT, . 











132 PSYCHOMETRIKA 


TABLE 5 


Rotated Inter-Battery Factors 














Battery 1 Battery 2 
*, %) M, B, 
(Transformation (Projections (Transformation (Projections 
Matrix) on Normals) Matrix) on Normals ) 
bo ae 9 a ok a | <_< a. 
I 434 .598 2) -670 -.029 I -4ho .633 23 564 -003 
II .901 -.802 54.674 = .023 II .898 -.774 24 =.630 -.014 
3 
ks -003 .771 27 e440 .135 
L6 003 .715 10 O15 .751 


51 -.012 .760 











' -1 
(my, 
~~ = 
A, 1.240  .567 
BL As ae E 2h1 








The definitions of the V matrices are different from those given in the section 
on the statistical test for the factors determined. Equations (44) to (46), how- 
ever, are still appropriate for the new definition of the V matrices. The T 
matrices are u X u, nonsingular matrices to be defined from the M matrices. 

Equations (65) give the formulas for determining the covariances of the 
tests in one battery with the composites in the other battery as determined 


by the V’s. 
(65.1) Cia ws’ Ri2V2 ’ 
(65.2) Cr si Ri2V; . 


As before, tests j and composites p are for battery 1, tests k and composites 
q are for battery 2. Let B;., be the matrix of regression weights for predicting 
scores on tests j in battery 1 from the composites g in battery 2. Then, from 
standard regression theory, 


(66.1) B;.. = C;: Ce - 








LEDYARD R TUCKER 133 


Similarly, ‘ 
(66.2) By.» = Cc; . 


It is desired to define the 7 matrices such that the rotated factor matrices 
B, and B, are the matrices of regression weights. 


(67.1) B, = B;., ; 

(67.2) B, = B,., . 

This may be accomplished by defining 

(68.1) T, = (WiR,,W,)"y'?Ms", 
(68.2) T, = (WiR..W.)'y'?M{"". 


The appropriateness of these definitions may be checked by matrix algebra 
using equations (15), (23), (45), (59), (64), (65), (66), (67), and (68). Because 
of the length of this matrix manipulation it will not be presented here. 

A simplified formula for computation of C,p may be obtained by sub- 
stitution of (64.1) and (68.1) into (45.1). Then 


(69.1) Cap = M;"y'"T, . 
Similarly 

(69.2) Cig = Mj'y'"T,. 
A similar substitution into (46) yields 

(70) Coe = (Ti')(7'"T.). 


Scores on the composites p and gq as defined in (44.1) and (44.2) from 
the V matrices will be considered as scores on the factors. Justification of 
this step lies in the fact that the basic postulate of the relation of observed 
scores to factor scores corresponds to a linear regression for prediction of the 
observed scores from the factor scores. This is precisely the situation in the 
present case. The entries in B, are the regression weights for predicting the 
observed scores s;; for battery 1 from the composite score x,; for battery 2. 
Similarly, the entries in the B, are the regression weights for predicting the 
observed scores s,; for battery 2 from the composite scores 2,,; for battery 1. 
As a consequence, the correlations between the composites x,; and x,; are 
the inter-battery factor correlations. These correlations may be entered in 
a matrix R,, , which may be determined from the covariance matrix C,, . 
Corresponding variances of the composites appear in the diagonal cells of 
the matrices C,p and C,9 . The correlations between corresponding factors 
in the two batteries may be interpreted as the factor reliability coefficients. 

Table 6 gives the two T matrices, the C matrices, and the R,, matrix for 
the illustrative problem. The 7’ matrices were computed by (68) from values 








134 PSYCHOMETRIKA 


TABLE 6 


Inter-Battery Factor Correlations 



































* T, 

mae aoe 
I, 5457 6329 L -4402 6050 
It, 4844 -.3362 IL, -3818 -.52h2 

Cop faa 

> 2 x 
A, O14 =. 5466 A, 7574 «4597 
B, -5466 .9912 B, 4597 1.0536 

c R 

Pq Pq 

_ 2 ~ os 
A, -5766 493 A, .693 = 458 
B, -4372  .8390 B, +505 621 











obtained during the factoring and the rotation of axes. Equations (69) and 
(70) were employed to obtain the C matrices. Variances for the factors are 
in the diagonal cells of C,p» and C,9 , which are used along with the entries 
in C,, to determine the correlations in R,, . The factor reliability coefficients 
are located in the diagonal cells of R,, . 

The coefficients of .693 for factor A and .821 for factor B, when compared 
with test reliabilities, seem low. Whether they are to be considered as low for 
factors depends upon further experience and consideration of their character- 
istics in terms of the purpose of the experimenter. Use of tests with low 
reliability will lower the factor reliability coefficients. If the tests have low 
communalities, the factor reliability coefficients will also be low. Large 
measurement error variances and large specific factor variances will result in 
low factor reliability coefficients. In the present example it is apparent that 
stability of the determination of the word fluency factor is considerably 
less than that of the verbal factor. This indicates the need for construction of 
better tests for word fluency. 











LEDYARD R TUCKER 135 


Use of the inter-battery method of factor analysis as presented in the 
preceding material should help in firmly establishing factors for the descrip- 
tion of human behavior. The controversial point of the estimation of com- 
munalities is completely avoided. A statistical test for judging the minimum 
number of factors justified by the data is provided. The factor reliability 
coefficients should be of assistance as an indication of the extent that factors 
may be determined from different test material. This latter is related to the 
extremely important psychological problem of generalizing factorial results 
over other behavior of people than is involved directly in the tests employed 
in the analysis. 


REFERENCES 


[1] Bartlett, M. S. Internal and external factor analysis. Brit. J. Psychol., Statist. Sect., 
1948, 1, 73-81. 

[2] Bartlett, M.S. Tests of significance in factor analysis. Brit. J. Psychol., Statist. Sect., 
1950, 3, 77-85. 

[3] Bartlett, M. S. The effect of standardization on a x? approximation in factor analysis. 
Biometrika, 1951, 38, 337-344. 

[4] Bartlett, M.S. A further note on tests of significance in factor analysis. Brit. J. Psychol., 
Statist. Sect., 1951, 4, 1-2. 

[5] Burt, C. A comparison of factor analysis and analysis of variance. Brit. J. Psychol., 
Statist. Sect., 1947, 1, 3-26. 

[6] Burt, C. Tests of significance in factor analysis, Brit. J. Psychol., Statist. Sect., 1952, 
5, 109-133. 

[7] Cattell, R. B. A universal] index for psychological factors. Advance Publication No. 3, 
Dec. 1953. Laboratory of Personality Assessment and Group Behavior, Dept. 
Psychology, Univ. Illinois. 

[8] Eckart, C. and Young, G. The approximation of one matrix by another of lower rank. 
Psychometrika, 1936, 1, 211-218. 

[9] Emmett, W. G. Factor analysis by Lawley’s method of maximum likelihood. Brit. 
J. Psychol., Statist. Sect., 1949, 2, 90-97. 

[10] Fisher, R. A. Frequency distribution of the values of the correlation coefficient in 
samples from an indefinitely large population. Biometrika, 1915, 10, 507-521. 

[11] French, J. W. The description of aptitude and achievement factors in terms of rotated 
factors. Psychometric Monogr. No. 5, 1951. Chicago: Univ. Chicago Press. 

[12] French, J. W. The selection of standard tests for factor analysis. Amer. Psychologist, 
1952, 7, 297. (Abstract) 

[13] French, J. W. Kit of selected tests for reference aptitude and achievement factors. 
Princeton: Educational Testing Service, 1954. 

[14] Henrysson, S. The significance of factor loadings. Brit. J. Psychol., Statist. Sect., 
1950, 3, 159-165. 

[15] Hoel, P. G. A significance test for minimum rank in factor analysis. Psychometrika, 
1939, 4, 149-158. : 

[16] Hotelling, H. The most predictable criterion. J. educ. Psychol., 1935, 26, 139-142. 

[17] Lawley, D. N. The estimation of factor loadings by the method of maximum likelihood. 
Proc. Roy. Soc. Edin., 1940, 60, 64-82. 

[18] Lawley, D. N. Further investigations in factor estimation. Proc. Roy. Soc. Edin., 
1942, 61, 176-185. 

[19] Lawley, D. N. Problems in factor analysis. Proc. Roy. Soc. Edin., 1949, 62, 394-399. 








136 PSYCHOMETRIKA 


[20] Lawley, D. N. Factor analysis by maximum likelihood: a correction. Brit. J. Psychol., 
Statist. Sect., 1950, 3, 76. 

[21] McNemar, Q. On the sampling errors of factor loadings. Psychometrika, 1941, 6, 
141-152. 

[22] McNemar, Q. On the number of factors. Psychometrika, 1942, 7, 9-18. 

[23] McNemar, Q. The factors in factoring behavior. Psychometrika, .1951, 16, 353-359. 

[24] Rao, C. R. Estimation and tests of significance in factor analysis. Psychometrika, 
1955, 20, 93-111. 

[25] Rippe, D. D. Application of a large sampling criterion to some sampling problems 
in factor analysis. Psychometrika, 1953, 18, 191-205. 

[26] Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 

[27] Thurstone, L. L. and Thurstone, T. G. Factorial studies of intelligence. Psychometric 
Monogr. No. 2, 1941. Chicago: Univ. Chicago Press, 

[28] Wold, H. Some artificial experiments in factor analysis. In: Uppsala Symposium on 
Psychological Factor Analysis, 17-19, March 1953. Nordisk Psykologi’s Monograph 
Series No. 3. Pp. 43-64. 

[29] Young, G. Maximum likelihood estimation and factor analysis. Psychometrika, 1941, 
6, 49-53. 


Manuscript received 2/18/57 
Revised manuscript received 8/26/57 











PSYCHOMETRIKA—VOL, 23, NO. 2 
JUNE, 1958 


COMPARATAL DISPERSION, A MEASURE OF ACCURACY 
OF JUDGMENT* 
HAROLD GULLIKSEN 
PRINCETON UNIVERSITY 
AND 
EDUCATIONAL TESTING SERVICE 

It is suggested that the ambiguity of a set of paired comparison judg- 
ments may be measured by the quantity Vo? + o;? — 2r,;0;0; . This quantity 
is termed the comparatal dispersion. A simultaneous solution for scale values 


and ratios of comparatal dispersions has been presented and applied to some 
data on food preferences. 





The discriminal dispersion may be taken to represent the ambiguity 
of a single stimulus. The greater the ambiguity, the greater the discriminal 
dispersion, and the nearer to .5 will be the proportion of judgments “i less than 
7’ for any comparison involving that stimulus. 

The Law of Comparative Judgment [8, 9, 10] states that 





2 3 
8 — 8; = 24; Vo; + 0; — 2r;;0;0; , 
where 


s; and s; are scale values of the stimuli; 

a; and o; are discriminal dispersions of the stimuli; 

r is the correlation between judgments for the two stimuli; 
z;; is the normal deviate corresponding to p;; ; 

p;; is the proportion of judgments 7 < j. 


So far no feasible solution for the o’s (or the discriminal dispersions) has 
been proposed. However, the radical term can be solved for in certain cases. 
The quantity Vo; + o; — 2r,;0,0; , which represents the ambiguity of the 
total comparative judgment, will be termed the comparatal dispersion. 

The problem of the present paper is ‘Does the magnitude of the com- 
paratal dispersion change with variation in the complexity of the judgment?” 

In this study the complexity of the judgment was varied by using what 
may be termed unitary stimuli together with what we will term composite 
stimuli [11]. Using these two different types of stimuli gives the possibility 

*This research was jointly supported in part by Princeton University, the Office of 
Naval Research under contract Nonr-1858(15), and the National Science Foundation 


under grant NSF G-642, and in part by Educational Testing Service. Reproduction in 
whole or in part is permitted for any purpose of the United States Government. 


137 











138 PSYCHOMETRIKA 


for three different degrees of “complexity of judgment’ as illustrated in 
Table 1. 

In item A, one unitary stimulus (Loin Lamb Chop) is paired with another 
unitary stimulus (Sirloin Steak). Since two unitary stimuli are involved for 
the judgment required in item A, we may say that this represents a judgment 
of complexity 2, which will be indicated by the subscript 2. 


TABLE 1 


Illustrative Stimli 











Complexity 
- Item Stimulus of Judgment 
€ Loin Lamb Chop Unitary 
A 2 
CJ Sirloin Steak Unitary 
Loin Lamb Chop and 
= Roast Rib of Prime Beef Composite 
4 
5 Sirloin Steak and 
C] Roast Loin of Pork Composite 
[| Sirloin Steak Unitary 
c 3 
Loin Lamb Chop and 
[J Boiled Smoked Beef Tongue Composite 





In item B, the subject is asked to choose between one composite stimulus 
(Loin Lamb Chop and Roast Rib of Prime Beef) and another composite 
stimulus (Sirloin Steak and Roast Loin of Pork). Since four unitary stimuli 
are involved for the judgment required in item B, we will designate these 
by the subscript 4 and say that this represents a judgment of complexity 4. 

Similarly, item C involves a comparison of a unitary and a composite 
stimulus. We will call this a judgment of complexity 3 since three unitary 
stimuli are involved in the judgment, and designate these by the subscript 3. 

The total schedule used involved five unitary stimuli (Lamb, Steak, 
Beef, Pork, and Tongue) together with the ten composites of these five. All 
possible pairs of these fifteen stimuli were used, except those which involved 
the repetition of the same stimulus. In other words, items of the form “Do 
you prefer Beef and Lamb, or Beef and Pork?’’ were omitted, since it was 
felt that some persons might judge one composite against the other composite, 








HAROLD GULLIKSEN 139 


while some might decide to ignore the common element and simply give a 
comparison between (in this illustration) Lamb and Pork. We also omitted 
items of the form “Do you prefer Steak, or Steak and Tongue?’ With these 
omissions, the total schedule was composed of ten items of complexity 2, 
thirty items of complexity 3, and fifteen of complexity 4. These items were 
randomly interspersed in the schedule given to the subjects. 

In the directions to the subjects, it was stressed that if two main dishes 
were presented in a choice, each was an ordinary-sized serving, and that it 
was to be assumed that both must be eaten, thus giving twice as much as if 
only a single dish were chosen. 

The responses to this schedule were analyzed to determine which of 
several laws of value increase were in agreement with the data and the results 
reported in [3]. Here we are presenting another analysis of the same data to 
see if the complexity of judgment influences the comparatal dispersion. 

The observed choices of the 92 college students are shown in Table 2, 
the corresponding normal deviates and relevant sums in Table 3. The in- 
complete data method described in [2] for constructing matrix M and vector 


TABLE 2 


Experimental Choices 

















a BRP a ae ON ares Sees ae BOS se BF gy Sar ied 
P68" XX Seb a eee SF Sp) sa) SO se ee 5. 46 5 
Lb. 99) Re 2) a GE se: SR se: NB BE BRS eet eer 10 
B 9. 88 79 X 19| 90 87 -- 70 60 -- 35 ie), eae rte 
orp SS Pp 8tis: 2 68 & = 2 oe | x ee 
AP) jee ee eS 2 Ba ee Se ee os ee pa > gee 
oe Hoe Se Be ee a we eg 
1B <s> 5. “80: se 6 we ee. CX 3T -- -- Ye 9 - 
hi TG aw ee ME UP ee UKUUG oe el le 
TS we OG NG $2! aes me aed TR SBS ee OB. caer se 
oS Be we Se, Be Bs eo Se se ee Su 
Se 8 = Pe when rd eres Ste. er 
IB. 8 389. es eT Gy as le ee GB ee, DX “= oo 
ES GE) BG) fee 69") cee eee... ee: 85) ew. oe TP sel ee cas 
BS OE Sy G2 ee. ea ge) Og ae GO ne ee Se Se eer oe 



































mm 
) 
os 
o 
Ns 
[oa 
2) 
a! 
~ 
oth'9 o - -- -- -- -- 20'S => TT OSG2 0zT"S -- --  O€2'T 009°T O62°2 sa Ss 
ogé*¢ -- 0 “= =* g6'0 + + 6e°0. a Tt Olt’ t “+ “ol9°o.- == O1G°T -C6e°e - ST 2 
Oze "2 -- -- 0 g0'0 8 9°0 =e sn me: HOOIT O6n¢ QGe"0- == -- On@'T OnG’T aT a 
OL6°T =* oe. aQotee 0 = ie 22 OPO) GUAT -- 00¢ *2 =< Q06°O OTL°O =< <=> “Getta = 
00S°0 ie i \ ° Gate as O. hie = “=  -90'T -- O¢O'T Onl’O- "= OTRO == O98€'T a wr) 
0¢o'T- = an 49°07 >> 2h O- eo s0°O0 40 == = ss 090° -- 06€*0- 000°0 St" -- $b 2£ 
009° T- zotge. s ae ae at ¢0°0- 0. Gato .-= =e 06g °- OZT’T- OTL°O- == -- On6°0 ‘ld + 
Otth’ 2- ee Gear * se 06°0° <> an, Geto o-. = = OT6"- OW6°O= -= O9T°O- O6T°0 -- aL 2 
ig 00° t- "g°T- = -- Gt'l= 90'T- == a as 0 <8 09g °¢- O20"e- 009°T == ONO e-  E bi 
& 099°S- ec’ Wie ao'ts -- =* = ied 2« => (0) 069 * h- 020°2- 020°2- 0Sg*0- -- -- dL | 
3 FETS +" = foary ro] 
a nt, ct, A 
o 060°L o- -- 062°0 “= onl'o --  OZT°T OF6°0 020°2 OZ0'2 OL Oo 2° TT “Gee SS*e s eS 
m OSL°¢ -- ol9°0- “= O00 -= 066°0 OTL°0 -- Q09°T Oz0‘'2 0S2"4 2g°- 0 LO°C Uhl. \6e°s -£ re 
One *T- Oce’°te -- QTL'O- OTH'O- 000°0 = ---—s: ONTO -- 0S9°0 OLL*0- IG*t- Lo°t- Qo: ai int 4 i 
. 0S¢°S- 009°T- OTS*T- ONG'T- >> n= “O6H*O=  =° | O6T°Or Ons'O o 09¢ “47 GG*e- ThL*T- #L°- 0 +9° d s 
OTO‘OT- 062°2- 062°2- ONG*T- O62°T- O96°T- -- O46°0- -- -- -- 06S °9- GS*e- 62°2- LO’T- %9°° o gf +2 
o 
— ae 
Atala co 2 sa ST a1 Sd ad SL Id ab WL dl 2t, s a t d L Ss 
an 22: 
oO 2 
Oo Ss 
(suotzzodoid *Stx0 worsy) saoueistTqd sntmutyS-s94qUuUl Tejuowtsadxg - saouaeratarg poog } on 
~ 3 
¢ Fav _ 
Bes 
3H 
g fe 
« 
iol NN 





141 


HAROLD GULLIKSEN 
































° OgL6" TS ° OZg’ZT- 096°L- On9°H- On6"C- OOO'T- 090'2 009° Ogg’ OOT’'g OzL'IT ° ° 0 ° fe) 

9) O | HGab’ 4S 0 ° ° ° 0 fe) ° te) 0 © J09Q*RT- 00S°g- ONG'T O2l’g OOT'CT 
O2T’S OTh’9- fe) 9 0 oO 0 0 ° — 0 tg Wi ro) te) <* Sg i ind 
Ola’ ® 0g6"¢- fe) 0 9 fe) fo) Tt fe) fe) , to) t ° t fe) RS 1< 
Os" < 02¢*2- fe) fe) fe) 9 dg 0 Tt fe) ° ts) te = ° (0) T Se 
O0¢ *2 0L6*T- ° fe) fe) {- 9 fe) ° ° T- iG 0 ° = i ig fe) 1 
oco"T 00S°- ° (e) 5 fe) fe) 9 biog ° fe) t fe) - ° i od fe) bi 
090° oco'T ° te) fe) ke ° as 9 bag to) ) ° ° b t T= fe) 
069"- 00g"T ° Tt fe) ° ° 0 Ts 9 Tt 0 ° T- T- ° te) 5 
OT6"- Otrh*Z fc) ty) 1 fe) I ty) fc) 1S 9 ty) ) = ) i 1S 0 
09g"¢- 060° fe) i a fe) fe) 2) t fe) ° 0 9 fe) i a 0 t 0 
069° 4~ 099°S ° t Tt = te) 0 ° fe) 0 0 9 tT tT Sy ° fe) 
060°L te) o¢h* L- te) fe) Tt (0) T- (0) 5 ‘t? og T- Oot t- T- it T 
OSL*¢ (0) 062° %- 0 Tt ° 1 0 T- T- t) Tt Tt 3 Gy oT 5 sis b oe 
One *T- fe) oLL*o T- fo) fe) s T- 1- ° Tt te) : T- T- ot t bi 
0S¢°S- te) 09€ * ty iG 1: i- t) ° x- ° T- t fe) T- t- Tt ot t 
OtO’OT- te) 06s°9 a T- = T- a fe) 5 ° ° te) Tt Tt T LG ot 
ao at, a, sa SI a1 Sd id SL a aL UW dL s € 1 d L 

*f, 


q TEVL 


*Z0ZD9A 9YR pus ‘aX ‘xTIZEW 9432 10} Sets su 





BRRRRRERKAAR 


Bannan 











142 PSYCHOMETRIKA 


The thirty judgments of complexity 3 were scaled, giving scale values 
for all fifteen stimuli, including the scale values of the five unitary stimuli, 
designated s,:3) , and the scale values of the ten composite stimuli, designated 
8.3) . From the judgments of complexity 2, scale values, designated s,,2) , 
for the five unitary stimuli were obtained. Thus for the five unitary stimuli 
it is possible to compare directly the scale values determined from judgments 
of complexity 3, s,:3) , with the scale values determined from judgments of 
complexity 2, s,,2) , a8 shown in Figure 1. 





Sie 
Tis 

Scale Values “5 ize § 
Complexities 2 (s,)vs 3s.) Serre} (obs) 7 a, E 7 


410 


(theer) 





4 
4 
4 
‘ 
‘ 
se? 
‘ “$ 
’ 
, 
¢ 
‘ 
‘ 
a. ° 


oi phe 
4 
s 


FiaureE 1 
Scale Values 82 vs. 83 














Correspondingly from the judgments of complexity 4, scale values, 
designated s,,,) , can be determined for the ten composite stimuli and com- 
pared with the scale values, s,,;) , determined from judgments of complexity 
3, as shown in Figure 2. 

If the comparatal dispersion for the judgments from which the s,,2) are 
computed are the same as the comparatal dispersions for the judgments from 
which the s,,3;) are computed, then the standard deviation of the scale values, 


n 
2 (Siucny — 3)? 
ey i=1 


g= 








n—1 2 











HAROLD GULLIKSEN 143 


would be the same for 8,2) and 8,,3) . If a given set of stimuli are scaled twice, 
scale values 84 being determined from judgments with a small comparatal 
dispersion, while scale values sz are determined from judgments with a large 
comparatal dispersion, then the standard deviation of the s, values will be 
larger than the standard deviation of the s; values, because the comparatal 
dispersion serves as a unit of measure. For example, a set of distances D, 
(measured in centimeters) will have a larger standard deviation than the 
same set of distances Ds; (measured in inches).In Figure 1 the solid line, 





+ 


S cale Valves wig ge *Y 
Complexities 4 (5) v3 (53) » dy 1135 foes) 


‘é Je 
> e 
¢ 
; 


The 
A 


“oO 
i 54:87 S, Ctheor). 
s]| lo 


° 


"h 

















“ ° w 
, o 
J ° 
»y g 5 
«9 ° 
° we 
ye lle o 
O/, 
¢ 
oa 
4115 
Fiacure 2 


Scale Values s; vs. 83 


8 = 1.46s; , shows the observed relationship between the scale values de- 
termined from judgments of complexity 2 and those determined from judg- 
ments of complexity 3. The slope of this line, 1.46, is an estimate of the 
quotient obtained by dividing the comparatal dispersion for judgments of 
complexity 3 by the comparatal dispersion for judgments of complexity 2. 

For judgments of complexity 2, one possibility is that the discriminal 
dispersions for each of the two stimuli will combine independently to give the 
comparatal dispersion for the judgment. According to such a possibility the 
comparatal dispersion for judgments of complexity 2 would be Vo; + o; , 
or o V2 if the two discriminal dispersions were equal. Similarly, if the 
three dispersions combine independently for judgments of complexity 3, 








144 PSYCHOMETRIKA 


then the comparatal dispersion will be Wo? + 07 + o2 or oV3 for three 
equal discriminal dispersions. That is to say, according to this view the 
comparatal dispersion would be proportional to the square root of the 
complexity of the judgment. If the comparatal dispersions are proportional 
to the square root of the complexities, then for 8,2) and 8,3) the 
comparatal dispersions are proportional respectively to V2 and V3. 
Using these as the unit of measurement, §,(2)/8.(3) = 03/2 = 1.22. The 
ratio observed was even larger, §,,2)/8.(3) = 1.46, as shown by the solid line in 
Figure 1. The individual stimuli can be identified from Table 6 since they 
are in order of magnitude of scale values. 

The Pitman-Morgan variance ratio test for correlated data [1, 4, 5, 6] was 
used to test the hypothesis that this variance ratio (1.46)’ is equal to 3/2. 
The result was a ¢ of 3.9 with three degrees of freedom, which corresponds 
to a p between .01 and .05. Testing the hypothesis that the variance ratio 
equals one instead of 3/2 yields t = 10.5 for three degrees of freedom. This is 
near the .001 level since the ¢ value for a p of .001 with three degrees of 
freedom is only 12.9, while the corresponding value at the .01 level is 5.8. 

Thus the hypothesis of equality of comparatal dispersions for §,,2) and 
§...3) 1s clearly excluded. The ratio of the squares of the comparatal dispersions 
is at least 3 to 2 and probably greater. It should be noted that since the two 
sets of data are correlated, the variance ratio test for correlated data was used. 
Using the statistical test appropriate for uncorrelated data will give different 
(and erroneous) results (see [13], p. 26). 

Turning to the composite stimuli as scaled by judgments of complexity 
3 and by judgments of complexity 4, §,.4)/8.(3) = .93-as shown by the solid 
line in Figure 2. Each of the ten composite stimuli can be identified from 
Table 6, since they are in the order given by their scale values. If the com- 
paratal dispersions had been proportional to the square root of the com- 
plexities, then for s,,4) and s,,3;) the comparatal dispersions would be pro- 
portional respectively to ~/4 and ~/3. Using these as the unit of measurement 
we would find that §,.4)/8.¢3) = V3/4 = .87. On the other hand, if the 
comparatal dispersions had been equal, this ratio would have been unity. 
The data are consistent with either hypothesis. The Pitman-Morgan variance 
ratio test (in each case) gives a t of about 0.7, which for eight degrees of 
freedom corresponds to a p of about .5. 

An indication of the relationship between the magnitude of the com- 
paratal dispersion and complexity of judgment can be obtained as indicated 
here by scaling the data separately and comparing the variances of the 
stimuli. The Pitman-Morgan variance ratio test for correlated data gives a 
method for testing the correlated variance ratios. However, it is desirable to 
have a method of simultaneously solving for scale values and variance ratios 
to be certain that some sort of best fit to the data has been obtained. A 
simultaneous least squares solution has been found. 











HAROLD GULLIKSEN 145, 


A Least Squares Solution 


In order to develop a least squares solution, regard the experimental 
matrix of differences in scale values, D;; , as partitioned into four submatrices, 
each homogeneous with respect to complexity of judgment, as illustrated in 
Table 5. For n unitary stimuli, 1, 2, --- , n, let the scale value differences be 


TABLE 5 


Partitioned Matrix of Differences of Scale Values 








J 
ee ~\ 
ce 2, 3,...,n n+l,n+2,n+)5,...,m 








nia. 
WwW Ww 


Di je ij3 





< n 
i 

n+l 
n+2 
n+3 Di43 Ds 54 
n+4 

















m= (n/2)(n - 1) +n 





recorded in submatrix D;;. of Table 3, these being from judgments of com- 
plexity 2. The scale value differences for the composite stimuli, n + 1, 
n+ 2,-+--,m, [where m = n+ (n/2) (n — 1)] are recorded in submatrix D; ;4 
of Table 3, and represent judgments of complexity 4. The judgments of com- 
plexity 3, involving both a unitary and a composite stimulus, give scale value 
differences which are recorded in submatrix D;;; and D’,;, of Table 3. In the 
present analysis the differences D, have a weight w, , the differences D, have a 
weight w, , and the differences D; have unit weight. In order to express all dis- 
tances in similar terms these weights should be directly proportional to the 








146 PSYCHOMETRIKA 


comparatal dispersions or to the unit used. For example, if some distances are 
measured in inches and some in feet, then multiplying the latter by 12 will 
express all distances in similar terms. Since the comparatal dispersions are 
proportional to the reciprocal of the standard deviations of the set of scale 
values, w, and w, are proportional to the reciprocals of 8,,2) and 8,4) , Te- 
spectively. Thus we can allow the weight to vary with complexity of judgment 
and so determine the effect of complexity of judgment on comparatal dis- 
persion. It should also be noted that the matrices D, , D, , and D; may be 
matrices of complete or incomplete data. Only the incomplete data case will 
be considered here since it generalizes easily to the complete data case, and 
since experiments with composite stimuli are likely to be incomplete data 
experiments. 
Define EZ, the error to be minimized, as 


mii 


(1) E= ) ® (w, Diis — 8: + 81)", 


where subscript g takes the values 2, 3, and 4 as indicated in Table 5. 


W, represents the unknown weights w, and w, and unity— 
the weight for D, . 
Dize represents the experimentally determined differences in 


scale values. 

s; and s; represent unknown scale values. 

M;; indicates that the summation is over the incomplete 
data (as many cells as are available). 


Minimizing 
mig 
[Diio — w,(s; — 8) 


would present a direct comparison between the experimental value D;;, 
and the unknown w and s. However, the resulting nonlinear equations 
present difficulties not found in the linear solution resulting from (1). 

Differentiating with respect to the s; , 

1 dE — 

(2) 2 as; = 2 > (s; — 8 — UW, Di). 

Setting the derivative equal to zero and separating the terms involving 
w, from those involving w, , 


a 


the n 
(3) Ms; — do 8 — Ws 2X Dir = p>) Dis , (@¢ = 1,2,--- ,n), 
fxd 








HAROLD GULLIKSEN 147 


and 
mi mn n 
ms; — > 8; — Ws i Dyin py Zz Diijs , 
(a a sai i 


G@=n+1,n+ 2, +--+ ,m). 


m,; indicates the number of cells in the row (or column) containing observa- 


tions involving s; . 
Differentiating with respect to the w, , 


t 


(5) es TS eh, — 6, Din +6 Bud 
2 ow, hea , 9 tio i tig i tig] ° 

Setting the derivative equal to zero and separating the terms involving w, 
from those involving w, , 

& n " n 
(6) We a pi Dii2 — 2 a («, ‘> Die) = 0, 

i=1 j=1 i=1 j=1 
and 


(7) mS SD 2S (« Din) = 0. 


f=n+1 j=n+1 f=n+1 i=n+1 


Designating the double summations by V, and V, , respectively, (6) 
and (7) may be written 
(8) mV, — 2 O(s, > Diss) = 0, 
and 
(9) w.V, — 2 = (s, = Diie) “0 


Equations (3), (4), (8), and (9) constitute a set of m + 2 equations in 
m + 2 unknowns (3s; , 8, °** » 8m 3 W2 and w,), which may be designated 
as the unknown column vector X. Define the following row sums from the 


matrices D352 ) Dii3 ’ Dis ’ and Disa ° 
2i24= — 2 Diia ; (¢ = 1,2,--- ,n). 
1 


n+i1,n+2,--+,m). 


j= 
‘in 
2a = -2 Dyin ; (a 


j=nt+l 


ta=+ > Dis; @=1,2,+++, 0). 


=n+1 


& 
zis = +2) Diss ; G@=n+1,n+2,---,m). 











148 PSYCHOMETRIKA 


Define the column vector Z,. containing m + 2 elements as the transpose of 


(Sis 5 Ben 5 °°* » Bun sewers 92s s °°* Bass 0, 0). 
Matrix M is an m X m matrix constructed from D;,;, as follows: 


put — 1 in each off-diagonal cell where data exist; 

put 0 in each off-diagonal cell where no data exist; 

each diagonal entry is m,; , the negative sum of the nondiagonal elements 
in the row (or column). 


M is symmetric and each row and each column sums to zero. Matrix F is an 
(m + 2) X (m + 2) matrix constructed by bordering M with an m + 1 
column 


a eer Se | eee 
and an m + 1 row 
(Qeis , Bees, °°* » Wae , 0,0, --- , 0), 
as well as an m + 2 column 
(050, ->-. 50, Birnie peeves °° 5 Beds 
and an m + 2 row 
(0,0, ++ 50, Qecnsiya y B2insaya 9 °° 9 Doma)s 
The lower right corner is completed with the 2 X 2 matrix 
ee 
0 Vv, 

Using these definitions of matrix F and the column vectors X and Z;. , (3), 
(4), (8), and (9) may be written 
(10) FX = Zy. 


Since the s; are determined only within an additive constant, we may solve 
by setting some s, , say s, , equal to zero, deleting the first element of Z,. 
and the first row and column of F giving 


(11) Fy,X, = Zs; ’ 
which has the solution 
(12) X; cea Pita . 


An Iterative Solution 


An iterative solution for (10) may also be indicated given a first approxi- 
mation, say P, , to the solution X. Such a first approximation might be 
obtained from scaling the results separately from D, , D, , and D, , and then 











HAROLD GULLIKSEN 149 


computing the w, and w, as ratios of standard deviations, as was illustrated 
in a previous section. The derivative of (2) with respect to s; is 


FE _ 


(13) as: Mm; . 


Likewise from (5), 
(14) aa Mm ¥ 
aw; Ne 7 ; tie 9 


One possible iterative procedure is to use the following correction to 
obtain the (p + 1)th approximation from the pth approximation, where x 
is the solution to the equation f(x) = 0. 





f' [x], 5 
f'Iz], 
The first and second derivatives of f(x) are designated f’(x) and f’’(x) re- 
spectively. This iterative procedure is suggested by Leydard Tucker as a 
modification of the scoring method for estimation of parameters given by 
Rao ({7], p. 165). 

In the present case this procedure may be indicated in matrix form by 
defining a row vector of reciprocals of the solutions given in (13) and (14), 
as follows: 





Up+i = Ly —, 


ie 2S 
(15) N ae (4 ’ Mz ? ’ Mn ’ V2 ’ V, ’ 
(16) P11 = P, + N(FP, — Zs) = P, + kNFE, . 


The arbitrary fraction k may be taken as unity unless this value seems to 
give unreasonable fluctuations. If so, then convergence can be made smoother 
with shorter steps by taking k as some value such as 3/4 or 1/4. 

In solving simultaneously for the scale values and the ratios of standard 
deviations for the present data, several methods were tried. The Gauss- 
Seidel method ((12], pp. 255-258) seemed to give the most rapid convergence 
for the present data. The final solution is given as column X in Table 6, 
together with the values for Z,. , FX, and (Z;. — FX), thus indicating the 
solution for (10). 

In Table 6, column Y shows the solution for scale values when it has 
been assumed that w, = 1, w. = V2/3 = .8165, or about 0.8, and w, = 
V/4/3 = 1.1547, or about 1.2. The values in column X approximate a 
best simultaneous solution for the 15 scale values and the values of w, and 
w, . For both w, and w, the best value was smaller than the initial guess 
indicated above. In the present experiment judgments involving two stimuli 
have a clearly smaller comparatal dispersion than those involving three. 
However, the change from three-stimulus to four-stimulus judgments does 
not seem to affect the comparatal dispersion. 











150 PSYCHOMETRIKA 


TABLE 6 


Solution for Scale Values and Ratios of Comparatal Dispersion 























Y x Zan FX Zoy - FX 
Solut on { 
ee S72 | 
-137 Tongue (T) -O475 -10.010 -10.0083 -.0012 
1 Pork (P) 4463 -5.350 -5. 3490 -.0010 
1.043 Lamb (L) -9221 -1.340 -1.3402 -0002 
1.746 Beef (B) 1.5844 3.750 3.7497 .0003 
2.197 Steak (S) 2.0111 7.090 7.0888 -0012 
000 T+P 0.0000 -4 .890 -4 8925 .0025 
270 Ss L 2296 -3.860 -3.8589 -.O011 
830 T+B -7615 -.910 -.9100 .0000 
928 P+L 8396 - .890 -.8902 .0002 
1.088 T+S .9890 .060 -0601 -.0001 
1.448 P+B 1.5048 1.030 1.0288 .0012 
1.780 P+S 1.6046 2.300 2.3002 - .0002 
1.993 L+B 1.8090 3.430 3.430% -.0004 
2.324 L+S 2.1006 4.470 4.4703 - .0003 
2.622 B+S 2.3382 5.120 5.1210 -.0010 
w .6835 Ce) .0002 - .0002 
wy, 1.0022 0 .0028 -.0028 
REFERENCES 


{1] Finney, D. J. The distribution of the ratio of estimates of the two variances in a 
sample from a normal bi-variate population. Biometrika, 1938, 30, 190-192. 
[2] Gulliksen, H. A least squares solution for paired comparisons with incomplete data. 
Psychometrika, 1956, 21, 125-134. 
[3] Gulliksen, H. Measurement of subjective values. Psychometrika, 1956, 21, 229-244. 
[4] Kenny, D. T. Testing of differences between variances based on correlated variates. 
Canad. J. Psychol., 1953, 7, 25-28. 
[5] Morgan, W. A. A test for the significance of the difference between the two variances 
in a sample from a normal bi-variate population. Biometrika, 1939, 31, 13-19. 
[6] Pitman, E. J. G. A note on normal correlation. Biometrika, 1939, 31, 9-12. 
[7] Rao, C. R. Advanced statistical methods in biometric research. New York: Wiley, 1952. 
[8] Thurstone, L. L. A law of comparative judgment. Psychol. Rev., 1927, 34, 273-286. 
[9] Thurstone, L. L. A mental unit of measurement. Psychol. Rev., 1927, 34, 415-423. 
[10] Thurstone, L. L. Psychophysical analysis. Amer. J. Psychol., 1927, 38, 368-389. 
[11] Thurstone, L. L. and Jones, L. V. The rational origin for measuring subjective values. 
J. Amer. statist. Ass., 1957, 52, 458-471. 
[12] Whittaker, E. T. and Robinson, G. The calculus of observations. London: Blackie, 1929. 
[13] Yarrow, L. J. The effect of antecedent frustration on projective play. Psychol. Monogr., 
1948, 62, No. 6. 


Manuscript received 7/1/57 
Revised manuscript received 9/25/57 











PSYCHOMETRIKA—VOL, 23, NO. 2 
JUNE, 1958 


APPLICATION OF THE QUARTIMAX METHOD OF ROTATION 
TO THURSTONE’S PRIMARY MENTAL ABILITIES STUDY* 


CHARLES WRIGLEY 
MICHIGAN STATE UNIVERSITY 
Davin R. SaAuNDERS 
EDUCATIONAL TESTING SERVICE 
AND 
Jack O. NEUHAUS 


UNIVERSITY OF CALIFORNIA 


This study compares a quartimax rotation of the centroid factor load- 
ings for Thurstone’s Primary Mental Abilities Test Battery with factorings of 
the same correlation matrix by Thurstone — structure), Zimmerman (re- 
vised simple structure), Holzinger and Harman (bi-factor analysis), and 
Eysenck (group factor analysis). The quartimax results agree ve closely 
with the solutions of Holzinger and Harman and of Eysenck, and reason- 
ably well with the two simple structure analyses. The principal difference is 
the general factor provided by the quartimax solution. Reproduction of the 
factorial structure is sufficiently good to justify its use at least as the first stage 
of rotation. More extensive trial of the method will be needed with more 
varied data before it will be possible to decide whether quartimax factors 
meet psychological requirements sufficiently well without further rotation. 


Rotation in factor analysis aims at decreasing the complexity of the 
factorial description of the tests or other variables being studied. It is con- 
venient to describe each such test factorially in terms of one or two large 
loadings and as many zero or near-zero loadings as possible. Since methods 
for achieving this kind of description have in the past been somewhat sub- 


*We wish to thank Professor L. G. Henyey and the University of California Com- 
puter Center for making the IBM 701 electronic computer available for this study, and 
the National Science Foundation for its support of the work of the Computer Center. 
Professor H. F. Kaiser of the University of Illinois has made helpful criticisms of the paper, 
and Mr. Louis S. Davis of the University of California has assisted with preparation of 
the tables. The research was supported in part by the United States Air Force under 
Contract No. AF 33 (038)-25726 monitored by the Air Force Personnel and Training 
Research Center. Permission is ted for reproduction, translation, publication, use 
and disposal in whole and in part by or for the United States Government. 

A 704 — for calculation of the quartimax and varimax loadings, prepared 
by Professor H. F. Kaiser, is available in the library of ag sage pro s held by the 

omputer Center at the University of California (Program No. 464). Mr. J. O. Neuhaus 
and Mr. K. W. Dickman have propared @ quartimax program for Illiac at the University 
of Illinois. This Illiac program will be usable on three other computers recently built or 
under construction: Mistic (Michigan State University), Silliac (University of Sydney), 
and the machine being constructed by Iowa State College. 


151 








152 PSYCHOMETRIKA 


jective and laborious, there has been rather general agreement that a more 
definitive mathematical procedure is required. A logical statement of the 
rotational problem is needed which, while adequately encompassing the 
psychological objectives, is sufficiently simple and precise to make possible 
an analytic mathematical formulation. 

The quartimax method attempts to meet these requirements. This 
procedure is one that maximizes the sum of fourth powers of the rotated 
factor loadings. The analytic rotational techniques recently reported by 
Carroll [3], by Neuhaus and Wrigley [12], and by Saunders [14], although 
developed from different lines of reasoning, all reduce in the orthogonal case 
to this simple maximization, so that the name quartimax may be applied to 
any of them. A similar fourth-power rotational criterion has been proposed 
by Ferguson [5]. 

Two practical advantages of this approach are easily seen. First, the 
procedure is objective; two investigators who start with the same data will 
secure the same results. Second, as a corollary of the first, the procedure is 
one that lends itself to machine computation. Quartimax results do not 
have to be the final solution. For those wanting simple structure, whether 
orthogonal or oblique, or some other preferred solution, further rotations 
can be made by conventional methods, starting from the quartimax loadings. 
Alternatively, the maximization function can possibly be modified better 
to meet psychological needs. The purpose of this article is to indicate how 
well quartimax loadings agree with factorial results already published. 

For this purpose a quartimax rotation has been made of Thurstone’s 
Primary Mental Abilities factor loadings [16]. In this study a battery of 57 
tests was administered to 240 college students, 13 centroid factors were 
extracted from the tetrachoric correlations, and 12 of these factors were 
rotated to orthogonal simple structure. Seven were identified with assurance 
and two others tentatively. The factors were given the following names: 
Verbal, Spatial, Numerical, Perceptual, Memory, Word Fluency, Induction, 
Restrictive Reasoning, Deduction. The Primary Mental Abilities study is 
of great historical importance, since Thurstone supplied the first large-scale 
illustration of his newly devised factorial methods. This study also demon- 
strated the practicability of extending factor analysis to batteries of fifty or 
more tests. It is hardly surprising, therefore, that the data have been repeatedly 
reanalyzed, including a bi-factor analysis by Holzinger and Harman [8], a 
group factor analysis by Eysenck [4] using Burt’s method [2], and a revised 
simple structure solution by Zimmerman [20], which started from Thurstone’s 
simple structure loadings and aimed to improve them. 

While Thurstone’s study is historically important, it is a pioneer one 
and may be criticized in terms of present-day standards. The sample of 
subjects is rather small. Furthermore, the use of tetrachoric instead of 
product moment correlations not only means larger standard errors but also 
inconsistencies among the correlations, so that the matrix is non-Gramian 











CHARLES WRIGLEY, DAVID R. SAUNDERS AND JACK O. NEUHAUS 153 


even with unities in the leading diagonal. The quartimax results reported in 
the next section are therefore only of limited value in study of the organi- 
zation of human abilities. The primary purpose of the paper, however, is 
the methodological one of finding the extent to which the analytic quartimax 
method of rotation approximates the standard nonanalytic procedures. The 
existence of four prior factorial solutions, derived by different methods and 
in terms of different logics, but all maintaining orthogonal reference axes, 
makes the Primary Mental Abilities study a good test of the extent to which 
quartimax results approximate standard rotational ones. 


Procedure 


The quartimax rotation of the full set of 13 centroid factors (Thurstone 
rotated only the first twelve) was made on the IBM 701 electronic computer 
at the University of California. The machine orders provide for successive 
pairings of factors in the order: 1, 2; 1, 3; --- ; 1, m; 2, 3; --+ 52, mj; ---; 
m — 1, m, where m denotes the number of factors. For each pair of factors, 
the machine finds the angle of rotation which will give the maximal sum of 
fourth powers of loadings for the transformed pair of factors and makes 
the transformation whenever the angle is one minute or greater. A set of 
m(m — 1)/2 pairings will be called a cycle. Cycles of pairings continue until 
the fourth-power sum for the full matrix no longer increases. The procedure 
can be proved to yield converging values for the fourth-power sums ([12], p. 
83). There is generally a rapid increase in the fourth-power sum at first and a 
slower increase later. Table 1 gives the fourth-power sum at the end of each 
cycle. Although 15 cycles were required for convergence to seven decimal 


TABLE 1 


Rate of Convergence for Fourth-Power Sum: Quartimax 


Rotation, Centroid Loadings, Primary Mental Abilities Study 








First dif- 
Cycle Sum of fourth ferences of Per cent of 
powers sums total increase 
) 9.528296 -- se 
1 12.344100 2.815804 87.184 
2 12.680771 - 336671 97.608 
3 12.738563 -057792 99.397 
4 12.752656 -014093 99 .834 
5 12.756190 003534 99 943 
6 12.757299 -001109 99.977 
7 12.757727 -000428 99.991 
8 12.757905 -000178 99.996 
9 12.757975 -000070 99.998 
10 12.758008 -000033 99 .999 
1 12.758023 -000015 100.000 
R 12.758028 -000005 
13 12.758030 -000002 
14 12.758032 -000002 
15 12.758032 -000000 


























154 PSYCHOMETRIKA 


places, more than 99 per cent of the increase had been realized by the end 
of three cycles. The entire solution required less than ten minutes on the 
computer, including reading cards with data into the machine and printing 
results. If the 99 per cent increase had sufficed, results could have been 
calculated and printed within two or three minutes. 

Because of the high speed of an electronic computer, it is feasible to 
calculate the full set of angles of rotation for each cycle. With punched-card 
equipment this is probably an inefficient procedure, since some rotations in 
each cycle will be very small and increments in the fourth-power sum very 
slight. Bolin ({1], pp. 234-239) has developed a technique to try to speed 
convergence by selecting pairs of factors where rotation promises to make 
the greatest difference. 


Results 


The proportion of high, medium, and low loadings in the various solutions 
will first be considered. The factorial results and interpretations in the five 
solutions wil! then be compared for the principal factors. 


Proportion of High, Medium, and Low Loadings 


Percentages of factor loadings of different sizes and signs in the five 
sets of results are summarized in Table 2. These percentages are based on the 


TABLE 2 


Percentage of Loadings of Different Signs and Sizes 











Positive Negative 
-Lo+ +20-.39 -19- -4o+ + 20-39 | .19- 
Quartimax 13.1 11.5 42.5 0.2 3.5 29.2 
Thurstone 11.8 25.4 43.9 -- -- 18.9 
Zimmerman 9.9 24.2 47.8 -- -- 18.1 
Holzinger & Harman | 16.4 5.4 78.2 -- -- ai 
Eysenck 17.0 4.5 78.5 -- -- -- 





























nine factors for the quartimax and Zimmerman solutions reported in this 
paper, the eight Thurstone factors, and for the eight factors together with 
the general factor for the Holzinger and Harman and for the Eysenck solu- 
tions. The quartimax method gives a higher percentages of high loadings 
than either simple structure solution, and at the same time more variables 
appear in the hyperplanes. These results are to be expected from the quarti- 

















CHARLES WRIGLEY, DAVID R. SAUNDERS AND JACK 0. NEUHAUS 155 


max logic of maximizing the dispersion of the squared loadings (or maximizing 
the kurtosis, as Saunders expresses the logic of the method). At the same 
time there are nineteen negative loadings of .200 or greater, whereas there 
are none in any other solution. Negative loadings have been contrary to most 
thinking about aptitudes, although Burt has argued for the meaningfulness 
of bipolar factors. 

The group factor analyses of Eysenck and of Holzinger and Harman 
have a still higher proportion of high and low loadings. Group factor methods 
insure a large number of zero loadings, since operations are performed on 
submatrices. On the other hand, loadings obtained by group factor methods 
do not usually give as good a fit to the observed correlations as do those 
obtained by rotation of centroid loadings. 


Comparison of the Factorial Structures 


The main quartimax factors are presented in Tables 3-11. Factors are 
ordered in terms of their contribution to test variance (except that two 
factors in the perceptual area and two in the memory area are presented 
together). In each table tests are arranged in order of size of quartimax 
loadings. Alongside are the loadings for the best matching factors in the 
other solutions. Each table includes all tests with a loading of .400 or greater 
in at least one of the five solutions. This was the size of loading selected 
by Thurstone in his report. In the bi-factor analysis and the group factor 
analysis, all loadings are zero except for tests selected for inclusion in the 
factor. 

General-Verbal Factor (Table 3). This quartimax factor agrees closely 
with Holzinger and Harman’s and Eysenck’s general factors. There is no 
general factor in either simple structure solution. 

The group factor methods provide a Verbal Factor in addition to the 
general factor of “‘intelligence.”” In either simple structure solution, the 
Verbal is the largest of all factors. The peculiarity of the quartimax results 
is that the Verbal Factor has merged with the general factor and lost its 
separate identity. If the quartimax loadings are compared with those of 
Holzinger and Harman, they will be seen to be generally higher for verbal 
tests but lower for nonverbal tests. 

The quartimax results do not therefore fully agree with either side in 
the celebrated dispute about the general factor, but represent a midway 
position. Like Holzinger and Harman and like Eysenck, the quartimax 
solution has a general factor; but like Thurstone and Zimmerman, one factor 
rather than two is sufficient to delimit the general-verbal area. Accordingly 
both the general factors of Holzinger and Harman and of Eysenck and the 
Verbal Factors of Thurstone and Zimmerman have been included in Table 3. 

Spatial and Numerical Factors (Tables 4-5). All five investigations 
agree substantially in their selection of tests for the Spatial and Numerical 





156 PSYCHOMETRIKA 


TABLE 3 


General-Verbal Factor* 


























Name of test Loadings 
Quartimax Thurstone |Zimmerman | H & H Eysenck 

60. Vocabulary (Thorndike) 932 385 -676 “71 741 

5. Reading II 865 -506 -706 -66 -662 

11. Completion -826 +333 541 64 -669 

10. Inventive Opposites -803 +635 “549 -62 -649 

4. Reading I -789 552 -638 -56 +554 

16. Inventive Synonyms -761 +495 -478 -59 -611 

58. Vocabulary (Chicago) «742 +395 -763 -39 -398 

41. Verbal Analogies +732 -597 +459 -81 824 

6. Verbal Classification +726 +301 -313 .82 814 

7. Word Grouping +723 456 -478 -65 -684 

57. Grammer -687 -498 -420 -63 -688 

4k, Pattern Analogies -670 -179 248 ‘TT +772 

2. Theme -665 +357 +435 54 -533 

4O. Reasoning -656 «420 465 -68 -688 

43. Code Words -653 +304 -352 -86 .868 

56. Spelling | 636 386 «433 46 «497 

42. False Premises -633 4k +391 -64 -653 

55. Sound Grouping -599 +453 +300 -70 -T07 

14. Disarranged Sentences 569 +395 -211 -66 -657 

47. Initisls -562 2350 -2h8 51 -527 

12. Disarranged Words 548 -102 +239 -56 -605 

45. Syllogisms -538 324 +226 75 -696 

21. Form Board 514 -.057 117 -67 -670 

9. Controlled Association «498 «450 .222 -27 +293 

39. Arithmetical Reasoning -497 -173 .192 -68 -683 

13. First and Last Letters «495 +366 .172 48 -537 

25. Mechanical Movements «453 -.028 136 -52 584 

37. Number Series -450 -296 -032 -63 -627 

49. Word Recognition -438 +035 -161 47 -472 

35. Tabuler Completion -420 132 -066 -57 +565 

28. Copying -400 = O45 -.086 -58 °575 

J 29. Areas 396 +030 -038 -58 -561 
24. Punched Holes +393 -.028 140 57 565 

26. Identical Forms 391 -.017 122 41 418 

15. Anagrams +390 -182 -.013 39 -437 

; 38. Numerical Judgment -390 -.021 -192 48 483 
4 19. Lozenges A 388 -083 +310 54 -520 
’ 30. Number Code - 386 228 -107 -68 .678 
54. Rhythm -376 +252 -134 40 -409 

34. Division +320 -094 -114 «46 «461 

‘ 23. Surface Development 308 -053 -.010 -52 +510 
18. Cubes 284 -.065 -.026 51 495 

48. Number-Number -279 -024 -.057 42 -420 

N 8. Figure Classification 269 054 -.131 45 ob 
4 22. zenges B -268 +052 «.003 -53 504 
17. Block Counting -234 -.014 -021 «40 389 











"Names given to this factor by the other investigators: 


General--Eysenck, Holzinger & Harman; 
Verbal--Thurstone, Zimmerman. 


Factors. It is interesting that the four judgmental procedures and the analytic 
method agree in this way, suggesting that placement of axes in factor analysis 
has been a less arbitrary affair than some critics have maintained. 
Perceptual Factor (Table 6). The quartimax solution has two factors in 
the area, labeled Perceptual A and Perceptual B, each with two loadings 
above .400. The Picture Recall and the Disarranged Sentences Tests, 
grouped together in the Perceptual A Factor, also appear together in Zimmer- 
man’s solution as his Memory for Observed Relationships Factor, so that he 
also isolated two factors within this cluster of tests, whereas other investigators 














CHARLES WRIGLEY, DAVID R. SAUNDERS AND JACK O. NEUHAUS 157 


TABLE 4 
Spatial Factor™ 











Name of test Loadings 
Quartimax Thurstone Zimmerman H&H Eysenck 

20. Flags -833 -636 727 -72 750 
22. Lozenges B 755 -633 -604 54 622 
18. Cubes +737 -626 -592 258 -606 
21. Form Board -663 «415 -317 -55 «489 
23. Surface Development -659 -551 -500 48 «497 
17. Block Counting 645 413 52k -56 -589 
24. Punched Holes -645 +336 +266 -50 «453 
19. Lozenges A -642 ALS -400 47 -512 
27. Pursuit -619 -584 -513 -52 -555 
53. Hands -599 455 547 45 -525 
8. Figure Classification -512 +393 .222 -42 -4ok 
45. Syllogisms -507 -430 -398 -- 325 
28. Copying -480 +270 -170 36 382 
30. Number Code e471 -109 374 -- --- 
43. Code Words 458 -2h1 -238 -- --- 
29. Areas 4UG 223 -208 7 +336 
6. Verbal Classification 334 412 +211 -- --- 
55. Sound Grouping -288 -412 -211 -- --- 


























*Names given to this factor: 


Spatial--Thurstone, Zimmerman, Holzinger & Harman; 
Visuo-spatial--Eysenck 


TABLE 5 


Numerical Factor* 











Name of test Loadings 
Quartimax Thurstone Zimmerman H&H Eysenck 
33. Multiplication -833 .812 -769 Th 743 
31. Addition -7h6 755 -764 -62 649 
32. Subtraction -698 -670 -659 54 -575 
34. Division -657 -619 584 -64 641 
30. Number Code 581 -625 -619 hl ALB 
38. Numerical Judgment +468 432 345 43 465 
35. Tabular Completion «432 392 -4o2 -4o ob 
39. Arithmetical Reasoning -402 383 -289 -4o 446 


























*Names given to this factor: 

Numerical--Thurstone, Zimmerman; 

Arithmetical--Holzinger & Harman, Eysenck. 
reported only one. Thurstone had nine loadings above .400 on the Perceptual 
Factor, whereas Zimmerman, and also Holzinger and Harman, have only 
three; Eysenck has four. This seems to be a case in which the weight of 
subsequent opinion has been against Thurstone’s analysis. Zimmerman 
considers the Perceptual Factor to be more easily interpreted with the reduced 








158 PSYCHOMETRIKA 


TABLE 6 


Perceptual Factor* 











Name of test Loadings 
Quartimax A] Quartimax B Thurstone | Zimmerman|H & H | Eysenck 
51. Picture Recall -562 -231 ~5U5 341 4B] UES 
14. Disarranged Sentences -559 -068 «461 +300 -35 | «427 
59. Word Count -221 «426 -360 436 At) o-- 
7. Word Grouping +206 -160 -573 -376 -- | .364 
6. Verbal Classification -166 334 +537 -581 +35] .436 
26. Identical Forms 136 -627 -603 -728 AZ] 549 
11. Completion -093 -082 «4.22 +311 -- --- 
41. Verbal Analogies +090 -.058 «417 2X72 -- --- 
44. Pattern Analogies 086 -O42 +435 +271 -- --- 
60. Vocabulary (Thorndike} -.040 2005 412 +291 -- --- 





























*Names given to this factor: 


Perceptual Speed--Thurstone, Zimmerman; 
Imagination--Holzinger & Harman; 
Classification--Eysenck. 











TABLE 7 
Memory Factor*™ 
Name of test loadings 
Quartimax A| Quartimax B Thurstone | Zimmerman|H & H | Eysenck 
48. Number-Number “724 -026 664 -709 -36 | .457 
46. Word-Number «436 -043 -529 -518 48] .499 
47. Initials +350 -208 -487 -528 57 -569 
50. Figure Recognition} .332 394 -420 514 “46 | 2495 
49. Word Recognition -179 437 -381 +336 -33 | 404 





























"Name given to this factor: 


Memory--Thurstone, Zimmerman, Holzinger & Harman, Eysenck. 


loadings. Somewhat different names and interpretations have been given to 
the factor by the different investigators. Thurstone [17] isolated a number of 
perceptual factors in his factorial study of perception. The existence of two 
quartimax factors in the area does not therefore seem to be unreasonable. 

Memory Factor (Table 7). The quartimax solution for a second time 
provides two factors in the area, labeled Memory A and Memory B. The 
second memory factor is a doublet including the two recognition tests. 
Other investigators have isolated only one factor in the memory area. 

















CHARLES WRIGLEY, DAVID R. SAUNDERS AND JACK 0. NEUHAUS 159 


Fluency Factor (Table 8). The Fluency Factor appears in the quartimax 
solution only as a doublet. Our results agree with those of Eysenck. Thur- 
stone and Zimmerman secured a much stronger factor. 

Induction Factor (Table 9). No Induction Factor was isolated either in 
the bi-factor analysis or the group factor analysis. In Zimmerman’s solution 
the analysis of the reasoning area was appreciably changed, so that 
Thurstone’s Induction Factor was renamed by Zimmerman the Classification 
Factor, while his Restrictive Reasoning Factor became the General Reasoning 
Factor. In the reasoning area, as in the perceptual area, it has proved difficult 
to isolate clearly defined factors. The quartimax factor has something in 











TABLE 8 
Fluency Factor* 
Name of test Loadings 
Quartimax Thurstone Zimmerman H&H Eysenck 
15. Anagrams -578 -534 -552 -60 -628 
13. First and last letter 546 388 448 -68 548 
12. Disarranged Words -339 512 519 42 +351 
57. Grammar 232 -530 -518 -- 351 
56. Spelling +159 -508 +463 -- one 
60. Vocabulary (Thorndike) +057 413 386 a esi 


























"Nemes given to this factor: 
Word Fluency--Thurstone; 
letter Fluency--Zimmerman; 


Completion--Holzinger & Harman; 
Verbal-Linguistic--Eysenck. 


TABLE 9 


Induction Factor™ 











Name of test Loadings 
Quartimax Thurstone Zimmerman 

8. Figure Classification 0475 «405 +219 
37. Number Series «452 +503 -437 
44. Pattern Analogies 425 -392 e411 
35. Tabular Completior 288 479 -491 
29. Areas -287 «477 -523 
39. Arithmetical Reasoning 158 +331 -642 
38. Numerical Judgment -108 -358 -604 
34. Division -062 +299 -498 




















*Nemes given to this factor: 


Induction--Thurstone; 
General Reasoning--Zimmermen 








ee ee ee ee 


= 


ion m -~ mere 


sey 





160 PSYCHOMETRIKA 


common with the Induction Factor of Thurstone and something in common 
with the General Reasoning Factor of Zimmerman. 

Audio-Rhythmic Factor (Table 10). The only two highly-loaded tests are 
Sound Grouping and Rhythm. Holzinger and Harman’s Rhythm Factor and 
Eysenck’s Audio-Rhythmic Factor comprise the same pair of tests. These 
are also the two tests with highest loadings on Zimmerman’s Classification 
Factor, although he found two other tests loaded upon the factor. Thurstone 
does not have a comparable factor. 

Syllogistic Reasoning Factor (Table 11). This quartimax factor, like the 
preceding one, is a doublet. All investigators agree upon this factor, and 
in all solutions except Thurstone’s the factor is marked by only a pair of 
tests (False Premises and Reasoning). This is our reason for replacing 


TABLE 10 


Audio-Rhythmic Factor™ 





Name of test Loadings 





Quartimax | Zimmerman | H & H | Eysenck 








55. Sound Grouping -538 635 +53 +520 
54. Rhythm «432 573 +53 520 
8. Figure Classification -197 “455 me See 
45. Syllogisms .152 «429 ae sms 
25. Mechanical Movements = yk -.113 aie ead 




















*Names given to this factor: 
Classification--Zimmerman; 
Rhythm--Holzinger & Harman; 
Audio-Rhythmic--Eysenck. 

TABLE 11 


Syllogistic Reasoning Factor” 











Name of test Loadings 
Quartimax Thurstone Zimmerman H&H Eysenck 
42. False Premises -580 -578 +629 -58 -575 
40, Reasoning 529 +525 -608 -58 575 
25. Mechanical Movements -132 -403 -328 -- --- 
8. Figure Classification -121 -398 +226 -- --- 























*Names given to this factor: 


Deduction--Thurstone, Zimmerman; 
Logical Reasoning--Holzinger & Harman; 
Relational--Eysenck. 














CHARLES WRIGLEY, DAVID R. SAUNDERS AND JACK 0. NEUHAUS 161 


Thurstone’s wider name of Deduction by the narrower one of Syllogistic 
Reasoning. 

Table 12 gives the transformation matrix for the rotation from centroid 
to quartimax loadings. In Table 13 appears the full set of quartimax factor 
loadings. 

The cosines of the angles between the quartimax rotated axes and 
Thurstone’s simple structure axes [19] are: Numerical, .968; Spatial, .920; 
General-Verbal, .879; Memory A, .803; Induction, .750; Syllogistic Reasoning, 
.606; Fluency, .560; Perceptual B, .515; Perceptual A, .496. 


Discussion 


The quartimax method might be adjudged useful on either of two 
grounds: first, that it provides a sufficiently good approximation to the 
desired rotational solution to warrant its use in the initial stage as a step 
towards the goal; or, second, that the quartimax results sufficiently meet 
psychological requirements to be taken as ends in themselves. 

Let us start by considering the degree of agreement between the various 
analyses. Three classes of factors may be distinguished. 


TABLE 12 


Orthogonal Transformation Matrix for Rotating 
Centroid Matrix to the Quartimax Factor Matrix* 





G.v. | S N P(A) | P(B) | M(A)}] M(B) | F I AR SR LR 3 


I 833 458 256 059 025 097 043 060 088 007 050 | -012 O43 

II 479 | -714 | -286 | 131] 095} -092] O84 | 196] -250] 171] 021] -011 | 087 
III -036 | -395 714 | -236 | -238 212 | -092 052 034 | -057 259 | -193 237 
Iv 117 | -177 | -279 | -204 158 474 313 | -413 305 | -418 143 | -132 | -104 

Vv 086 | -231 | 332] 326] 250] -138 | -275 | -559} 092} -115 | -4o9 | 250 | -046 

VI 181 | -062 | -082 | -ko5 | -509 | -4ak | -039 | -211 | -032| -213 171 434 | -211 
VII -029 023 001 | -306 47k | -102 | -061 | -231 187 548 4h 249 147 
VIII -082 | 112 | 239 | -057 | 239 | -121 |] 604 | -161 | -634 | -160{ 023 nu 133 
m -022 094 | -194 279 | -355 503 | -189 | -272 | -317 iby 168 328 360 

x 005 | -020 113 229 | -237 031 161 | -326 | -134 438 182 | -334 | -625 

XI -092 | -119 199 221 | -013 257 369 361 281 O74 004 620 | -294 
xII 034 | -020 | -021 | -317 | -204 058 354 | -150 188 432 | -611 | -038 266 
XIII -059 | -002 | -034 485 | -207 | -407 340 | -125 396 | -065 289 | -121 4o1 





















































“Decimal points have been omitted. 


1. Clearly defined and easily recognized factors upon which all investi- 
gators substantially agree, and in which there is virtually no doubt as to the 
tests most representative of the factor, e.g., the Numerical and Spatial 
Factors. 

2. Factors which all investigators tend to isolate, but with disagree- 
ments as to the most suitable name, the most representative tests, and the 
amount of variance attributable to the factor. The perceptual area, for 
example, does not seem to be as well structured as either the spatial or the 
numerical area. It will be recalled that Thurstone had nine tests with loadings 








162 PSYCHOMETRIKA 


TABLE 13 


Quartimax Factor Loadings: Thurstone's Primary 
Mental Abilities Study 





Factor 





P(B) | M(A) | M(B)} F I AR SR 
~099 | -213 | -013 | 122] -058] -207 | -002| o7h| 095 
-O14 | -163 033 no} 124] -209 | -095 084 | -013 
019 166 | 334 060 | -135| -097 165 172| -020 
051 206 | 160 | -167 | -028/ -047 081 O11] -ok2 
004 | -052 105 | -O41l | -006 138 475 197 121 
003 009 037 008 | -066 193 | -005 001] -164 
085 198 | -219 | -025 | -075 116 | -128 | -020| -068 
-095 093 082 | -012 | -061] 136 | -151 | -246| -222 
128| 036 O44 | -001 180} 339 | -019 | -050] -o12 
090 | -089 | -039 | -122/ -013 546 | -021 | 020] 080 
146 559 068 | 058] -033 017 034 181} 019 
15 390 | 038 172] -012 | -055 100 | obo} 578} 082] 081] -032 
16 761 | -078 115 093 086 | -008 | 017 243 | -033 

17 234 645 102 158 | -045 | -115 | -082 | -086 | O47 | -306 | -183 


12 
029 
068 
-062 
ake 
087 
O47 
013 
-076 
-086 
19 
063 
-102 
123 
059 
737 -132 
19 388 642 031 | -079 | -037 | -159 261 | -224 | -075 087 119 032 | -043 

097 
089 
-007 
-048 
011 
-024 
O48 
-099 
-016 
-176 
337 
227 
O42 
-108 
-221 


ar J 
° 
un 
cr 
o 
< 
o 
z 
x 
> 
LS 











18 28h og2 117 153 | -097 | -243 106 o72 | -009 | -008 -150 
20 131 | 833 137 | -032 | -013 | -059 | -066 | 021 | -197 057 008 -182 
21 514 663 | -035 | -002 | -036 | -02h 297 089 100 | -202 | -201 100 
22 268 | 755 031] 088 035 201 | 017] -003 | -o48 152} 200 005 
( 23 308 659 035 | -Obb 107 161 | -012 054 | -097 | -012 O46 231 
! 2k 393 | 6h5 010 | -241 018 | 002 317 | -o92 118 | -083 140 083 
‘ 25 453 391 | -0O71] -108 051 | -069 109 088 253 | - bbb 132 -063 
26 391 | 279 | -122 136 | 627 056 | -065 | -013 | -037 | -ob6 | -107 -030 
27 155 619 204 010 | 021 | -6 | -199 055 159 118 | -090 212 


480 102 108 071 142 060 | -005 394 | -1k2 | -188 
2k9 O49 | -073 287 | -068 | -021 


8 
% 
5 
g 
® 
8 


° 30 | 386 471 581 130 | -Ob1 103 001 | 022] 015 | -180 | 027 -039 
t 31 112 167 746 029 075 O49 | -018 080 031 004 090 065 
: 32 | 299 O74 698 095 | -142 | -150 16 O46 | -091 198 106 108 
b4 33 | 164) 164 833 O46 | 007 068 | O41] 079 | -036 | 010] -160 -016 
4 34 | 320 | 200 | 657] -128 | 066 057 | -206 | -050 | 062 | 021 | 060 -132 
. 35 420 | 226 | 432] -267 | -015 253 | -035 011 288 | -026 066 | -034 | -o14 
4 36 | 292 251 | -003 234 | -327 | -053 | -137 | -246 | -074 | -158 122 122 | -103 











; 39 497 367 4o2 036 | -234 188 | -133 | -176 158 | -325 059 | -167 015 

ho 656 158 055 | -153 | -158 028 125 006 102 | -050 529 O42 | -072 
4 kl 732 32h 060 090 | -058 | -097 | -087 | -057 141 031 | -017 418 | 068 
; ko 633 105 O44 | -030 | -010 14 | -056 038 | 071 | -038 580 | -065 | -02h 
‘ 43 653 | 458 186 | -ohe O77 169 030 | -147 un7 065 | -019 259 | -131 
* ky 670 336 | -038 086 O42 | -132 O43 | -005 425 | -103 | -00% | -ok8 | -096 

4s 538 507 175 | -232 059 108 | -198 021 062 152 052 016 116 
: 46 334 069 032 076 099 436 043 164 | -114 | -182 106 003 158 
t 47 562 | -020 | O40} -070] -024 350 | 208 263 | -089 | -018 122 116 ako 
- 48 279 106 241 034 Oly 72k 026 | -065 O77 | -002 | 030 | -028 | -02k 
4 4g 438 138 | -050 ; 083 | -012 179 437 142 | -002 020 089 | -084 | -002 
‘ 50 365 ’y2 | -113; 066} -150 332 394 061 | -007 013 | -075 064 | -188 

51 264 060 137 562 231 08h 088 | -100 | 039 | -089 |! -185 | -023 | -018 


52 665 | -062 117 081 036 064 219 | -179 | -obo 197 Ob | -204 | 258 
53 064 599 208 167 | -234 226 | -019 157 | -065 147 | -071 | 079 | -125 
54 376 165 027 | -084 | -091 | -012 036 097 076 431 | -098 | -198 200 
55 599 288 1ke 101 | -006 | -117 | -021 084 ike 538 | O11 | 050 | -055 
56 636 | -084 167 149 | -108 | -125 | -055 159 | -094 032 165 | -341 | -002 
57 687 | 090 167 024 | -250 | -oho | - 231 | 037 197 | 203 | -176 | -095 
58 Tue | -086 020} -259 131 | -29h 136 | -134 | -364 009 | -184% | -112 209 
59 263 | -101 | 264 j 221 426 | -023 | 031 | -125 O44 | -032 075 | -O71 136 
60 932 | 030 | -019; -O40 | 005 058 | -047 057 | -023 | -132 | -080 | -082 | -3h4 


| 
Sum 15.245 |7.460 3-807 | 1.490 1.384 {1.749 [1.083 1.639 |1.517 [1.510 |1.302 [2.165 |1.134 





















































above .400 on the Perceptual Factor, but Zimmerman, also aiming at isolation 
of a simple structure factor, found only three. The complexity of the area 
was revealed in Thurstone’s subsequent intensive analysis [17] of perceptual 

















CHARLES WRIGLEY, DAVID R. SAUNDERS AND JACK 0. NEUHAUS 163 


tests. There seems to be a similar difficulty in deciding upon the tests most 
representative of the Fluency Factor. 

3. Factors teased out by some investigators but not by others. There 
are numerous illustrations in this paper. Thurstone’s Induction Factor did 
not appear in Zimmerman’s reshuffle but reappears in the quartimax solution. 
Holzinger and Harman identify an Analogies Factor, based on the Verbal 
Analogies, Code Words, and Pattern Analogies Tests, but with very small 
loadings. The same three tests appear together on Zimmerman’s Eduction 
Factor with higher loadings. In the quartimax solution, the three appear 
together with high loadings on the General-Verbal Factor. 

Twenty years’ use of multifactor methods has indicated the very compli- 
cated pattern of linkages among the tests. Guilford [6] has found the necessity 
of postulating an ever-increasing number of factors as his researches have 
proceeded, and Guttman [7] has invoked new mathematical models in the 
attempt to fit correlational patterns more closely. It is hardly surprising 
that one positioning of axes reveals some of the linkages (i.e., factors) and 
another reveals others. The quartimax results provide an interesting example. 
Previous investigators had reported only one factor in the memory area. 
The quartimax provides two. The second factor represents the fact that the 
two recognition tests have a higher correlation with one another than with 
the recall tests, i.e., the submatrix of correlations of the five memory tests 
is not of rank one. 

The Primary Mental Abilities tests can therefore be classified in various 
ways; there are advantages and disadvantages in each system. There has 
been much discussion in the past as to whether it is better to have a hier- 
archical organization of factors, as is characteristic of the Eysenck and the 
Holzinger and Harman solutions, or a coordinate organization, as is to be 
found in simple structure solutions such as those of Thurstone and Zimmer- 
man. The psychological differences between the two types of classification 
should not be stressed too greatly, since there is in fact rather little dis- 
agreement as to the principal factors represented in the battery. The main 
differences lie in the relative emphasis to be attached to the various factors. 

The first matter of concern is whether the quartimax method provides a 
sufficiently reasonable rotational approximation to warrant its use in the 
initial stage of rotation. In our view it does. Except for the absence of the 
Verbal Factor, the quartimax solution fairly closely reproduces Holzinger 
and Harman’s, even to the extent of having the same doublets. Eysenck’s 
results are also fairly similar. Since the quartimax results tend to the hier- 
archical form of classification, agreement is not quite as good with the two 
simple structure solutions. Nonetheless, eight of Thurstone’s nine factors 
are represented in the quartimax results. Differences are mostly differences of 
emphasis. Thurstone’s Verbal Factor has been built up into a general factor, 
and the Perceptual and Word Fluency Factors are reduced in size and im- 








164 PSYCHOMETRIKA 


portance. The quartimax results agree on the whole somewhat better with 
Zimmerman’s revision than Thurstone’s original solution. 

The proposition that the quartimax results sufficiently meet psychological 
requirements to be taken as an end in themselves is obviously harder to main- 
tain. For many the main deficiency in the quartimax solution will be the 
presence of a general factor. Obviously a quartimax solution is not a simple 
structure. 

The presence or not of a general factor in a quartimax rotation depends 
upon the particular set of correlations. (See the examples in [12].) Because 
the sum of squares remains constant in orthogonal transformation, the highest 
fourth-power sum is attained by concentrating variance for each test into one 
or as few loadings as possible in each row of factor loadings. That is to say, 
the objective of the quartimax method is to get the simplest possible factorial 
description of each test, and this is attained by finding the nearest solution 
to unifactoriality (each test loaded on a single factor) which is achievable 
by orthogonal transformation. The quartimax function places no restriction 
upon the distribution of variance by columns. If concentration of variance 
occurs largely in a single column, i.e., with respect to a single factor, we 
get a general factor. Otherwise we do not. (The maximization function is 
performed at present on pairs of columns, but this is merely a computational 
necessity because no solution is currently available for the nonlinear equations 
resulting from maximizing the full set of loadings simultaneously, [12], p. 82.) 

The general factor has been the subject of considerable controversy. 
Spearman [15] criticized Thurstone’s own solution in this study because of the 
absence of a general factor. He argued that a general factor was entirely 
reasonable, since this provided statistical representation for the predominantly 
positive correlations among the tests. Some others agreed with Spearman. 
Neither Holzinger and Harman nor Burt could see any psychological objec- 
tion to a general factor, and both the group factor method and the bi-factor 
method specifically provide for one. 

The issue for Thurstone, on the other hand, was not so much whether a 
general factor exists as that of the best way of portraying it for scientific 
purposes. He argued that a general factor lacked the invariant property of 
group factors, i.e., that group loadings would change less when further tests 
were added to the battery. Hence his second criterion for simple structure 
eliminated the possibility of a general factor by requiring some zero or near- 
zero loadings in every column, even if oblique axes were needed to achieve 
this. But he regarded the issue as still open. He wrote: ““The newer methods 
(the multiple-factor methods) leave it as a question of fact whether a general 
factor is in the battery and whether it is an orthogonal general factor or a 
second-order general factor’’ ({18], p. 273). And the acceptance of second-order 
general factors indicated a willingness to accept some form of hierarchical 
organization among the simple structure factors. 

At least four lines of action are possible in view of this tendency for 








CHARLES WRIGLEY, DAVID R. SAUNDERS AND JACK O. NEUHAUS 165 


the quartimax method to give a general factor: first, to regard the quartimax 
loadings as only an approximation to the desired rotation and to complete the 
rotation graphically; second, to regard the general factor as evidence for 
poor test selection and as an indication that the data should be reanalyzed 
with a reduced set of variables; third, to accept the general factor as psycho- 
logically quite reasonable, as Spearman and others would have done; fourth, 
to modify the maximization function in the attempt to get rotational solutions 
approximating more closely to simple structure. Each alternative will be 
briefly discussed. 

1. The first possibility is to regard the quartimax loadings as only an 
approximation and to make final adjustments, including elimination of the 
general factor, by plotting graphs. For example, the variance of the quartimax 
general-verbal factor might be split by rotation with the bipolar Audio- 
Rhythmic Factor, or with the unreported Factors 12-13. Cattell has followed 
this course with quartimax solutions he has obtained from Illiac (the Univer- 
sity of Illinois computer). This may eventually prove to be the only feasible 
solution for the dilemma, but it is one that should not be accepted until other 
avenues have been closed. The computational disadvantages of graphical 
methods are evident. Even more important, subjective rotational solutions 
suffer from the disadvantage that it is difficult to secure agreement upon the 
definitive solution. For example, is Thurstone’s simpie structure solution 
or Zimmerman’s simple structure solution to be preferred for the Primary 
Mental Abilities study? So long as application of the techniques remains a 
matter for the investigator’s judgment, there will be no assurance that any 
two persons starting with the same data will reach the same results, and 
factor analysis will remain an arbitrary affair. The objectivity of an analytic 
approach is a virtue not to be discounted lightly. 

2. A general factor might be regarded as evidence for poor test selection. 
There may be too many complex variables in the battery which carry 
appreciable loadings on more than one factor, or disproportionately many 
tests drawn from some areas relative to others. For example, too many verbal 
tests may have been included in the Primary Mental Abilities study. The 
initial quartimax solution could be used to identify over-represented areas 
and factorially complex tests. The data could then be reanalyzed for the 
purified set of variables in the hope that the general factor would no longer 
appear. Achievement of a solution without a general factor would indicate 
good experimental design and attainment of a representative sample of tests. 
But any doctrine of test purification seems hazardous. We run the risk of 
contorting experimental data to fit theoretical preconceptions rather than 
allowing theories to be moulded and reshaped by empirical findings. Spearman 
tried to maintain his Two-Factor Theory by rigorous requirements of test 
selection, but it was a futile effort. The existence of multiple factors eventually 
could not be denied. 

3. The problem of the general factor is often construed in terms of black 


ae 
a 
~ 
o 
oa 
y 
he 








166 PSYCHOMETRIKA 


and white; we should always accept a general factor or never do so. But when 
the experts disagree, so that a good case can be made both for and against, 
may not the flexibility of the quartimax method, sometimes yielding a general 
factor and sometimes not, be a desirable feature which will help in finding a 
way around the impasse? The quartimax method is effectively an appeal to 
parsimony. The aim is to find the rotation with the simplest factorial descrip- 
tion of the tests, in the sense that each test has appreciable loadings on as 
few factors as possible. When a general factor increases this parsimony, the 
general factor is accepted. This will usually enable test relationships to be 
stated in slightly fewer factors than in a simple structure solution, and in 
terms of a higher proportion of zero or near-zero loadings. But when a general 
factor does not contribute to parsimony in the solution, it is rejected. 

4, The fourth possibility is modification of the function for maximiza- 
tion. A very promising modification is that of Kaiser [10] in his varimax 
procedure. He has inverted the quartimax logic. Instead of aiming at in- 
equalities in the rows of loadings, his concern is to get maximal inequalities 
in the columns. His logic is to get a simple and distinctive account of each 
factor, with some tests highly loaded on it and some not. He accordingly 
maximizes the sum of variances for the individual columns of squared loadings 
rather than the variance for all squared loadings considered together, as the 
quartimax function does. Kaiser finds the varimax results to correspond more 
closely to simple structure than do the quartimax. Although there are slightly 
fewer small loadings there is a better distribution of them, in terms of simple 
structure concepts. The examples he has reported [11] usually meet Thurstone’s 
five simple structure criteria. So long as varimax solutions are found to meet 
the simple structure criteria, it might be reasonable to accept the varimax 
rotation as the definitive simple structure. 

A current problem, therefore, is to decide between the quartimax ro- 
tational model, which resembles more nearly the hierarchical organization 
favored by Burt, and the varimax model, which resembles more the coordinate 
organization adopted by Thurstone. How are we to decide which is the more 
suitable psychologically? 

Ideally, rotational solutions should meet the four requirements of: (a) 
objectivity and uniqueness, (b) meaningfulness, (c) parsimony of description, 
(d) numerical invariance of loadings for retained tests when further tests are 
added. 

Objectivity is one of the principal advantages of analytic solutions 
over graphical ones. By this is meant that another investigator starting with 
the same data and following the same analytic rules will obtain the same 
results. This does not hold for graphical methods. Thurstone and Zimmerman, 
both trying to apply the same simple structure criteria, reached different 
results. The objectivity of the analytic solutions has been stressed rather 
than the uniqueness because of a practical limitation of current computational 











CHARLES WRIGLEY, DAVID R. SAUNDERS AND JACK O. NEUHAUS 167 


techniques. The desirable procedure would be maximization of the entire 
matrix simultaneously. The present method, however, operates with pairs of 
columns of loadings, and different ordering of factors is known to give slightly 
different results ([12], p. 82). That is, at present an approximate quartimax 
or varimax is calculated instead of the true one. But once a rule is made 
upon the ordering of factors, results become replicable for the same data and 
fully comparable for different sets of data. This objectivity, however, is 
common both to the quartimax and the varimax methods, so that it provides 
no ground for selecting between them. 

Nor is the criterion of meaningfulness of much assistance in deciding 
between the two methods. The group factor method has seemed psycho- 
logically acceptable to Burt, and simple structure to Thurstone. Since the 
two approaches have arrived at more or less the same factorial description 
of ability, it does not seem reasonable to assert that one makes good psy- 
chological sense but the other not. 

In terms of parsimony, the advantage seems to rest with the quartimax 
method rather than the varimax. The quartimax line of reasoning is indeed 
an extension of the concept of unifactoriality ([9], pp. 95-98). If tests could 
always be expressed in terms of only one factor, there is little doubt that a 
unifactorial solution would be the preferred one. With empirical data, they 
cannot be expressed so simply. The quartimax method aims at getting as 
simple a factorial expression of each test as the data allow. Hierarchical 
systems of classification have in many branches of science been found to be 
very convenient and parsimonious. 

The quartimax method is principally vulnerable on the issue of factorial 
invariance. If Thurstone is correct that invariance can be attained with 
simple structure but not with a general factor, and if invariance is required, 
the quartimax method obviously has to be rejected as a final solution. The 
principal evidence at present for the invariance of simple structure is derived 
from: (a) Thurstone’s box example ([18], pp. 369-376), and (b) experience 
attained in application of simple structure methods to a wide variety of data. 

The recent development of analytic methods of rotation seems to make 
feasible empirical studies of factorial invariance to try to specify more pre- 
cisely than hitherto the conditions under which loadings are invariant and 
those under which they are not. Since varimax results conform more closely 
to simple structure requirements, it may be hypothesized that the varimax 
loadings should vary less than the quartimax ones as additional tests are added 
to the battery. Kaiser has carried out a preliminary investigation of this 
hypothesis, using the 24-variable correlation matrix of Holzinger and Harman 
({9], p. 30). For these data the varimax loadings are indeed more stable, as 
hypothesized. If this greater stability of varimax loadings proves character- 
istic of a representative range of correlation matrices varying considerably 
in factorial structure, it will supply a powerful reason for preferring the 


coal 
Le 
~ 
wd 
dt 
a 
ol 











168 PSYCHOMETRIKA 


varimax method to the quartimax. Although varimax is more stable than 
quartimax in the Holzinger and Harman example, it should not be inferred 
that quartimax results fluctuate greatly. They, too, are rather highly stable, 
and rank ordering of the principal tests defining any factor hardly ever 
changes. Indeed, there appear to be smaller differences in rank ordering of the 
principal tests with the analytic methods with varying numbers of tests than 
with Thurstone’s and Zimmerman’s Primary Mental Abilities simple struc- 
tures, subjectively obtained, although both are calculated for exactly the 
same set of tests. There seems a strong possibility that analytic methods in 
general will be more stable than the traditional graphical methods. 

Invariance has to this point been considered with respect to adding 
further tests to the battery. There is another problem of stability of factors, 
however, which seems so far to have received hardly any logical or empirical 
consideration. No generally accepted objective procedure is currently avail- 
able for determination of the number of factors to be rotated, and some 
investigators will extract and rotate more factors than others. Increased 
dimensionality seems on occasion to lead to splitting of factors. A factor with 
a large number of highly loaded tests will split into two smaller factors each 
incorporating a subgroup of the tests. The problem of invariance with relation 
to an increased number of rotated factors has not yet been systematically 
studied, but it appears likely that the quartimax factors, because of their 
greater tendency to hierarchical organization, will be less likely to split than 
their varimax counterparts, and in this regard will be the more invariant of 
the two analytic techniques. 

A final consideration in comparing quartimax and varimax may be the 
number of negative loadings. There are nineteen negative loadings of .200 or 
greater in the quartimax solution for the Primary Mental Abilities study but 
only four in the varimax. In this respect, therefore, the varimax solution 
seems preferable to the quartimax. 

To summarize these comparisons, present evidence suggests the quartimax 
results are the more parsimonious and may be the more stable when the 
number of factors for rotation is increased, and that the varimax results are 
more stable when the number of tests in the battery is increased and at least in 
the Primary Mental Abilities study have the fewer negative loadings. 

No attempt has been made in this paper to consider the problems of 
analytic rotation to oblique structure. The quartimax method generalizes 
differently to oblique structure according to whether the logic followed is 
that of Carroll [3], Neuhaus and Wrigley [12], or Saunders [13]. The only one 
so far programmed for electronic computer is Saunders’ (for Illiac), so that 
we are not yet in a position to make any systematic study of the oblique 
situation. Kaiser ({11], p. 45) has developed a function for generalizing his 
varimax procedure to the oblique case, and the mathematics of the problem 
have recently been developed by Carroll [3a]. 

















CHARLES WRIGLEY, DAVID R. SAUNDERS AND JACK O. NEUHAUS 169 


This paper presents only an interim report upon the problems involved 
in trying to develop an analytic method of rotation and upon selecting the one 
best meeting our psychological requirements. For anyone with access to an 
electronic computer, the analytic methods presently available provide at 
least a rapid and objective approximation to the desired solution. For those 
favoring a simple structure solution, the varimax method appears to be the 
best starting point currently available, while for those preferring hierarchical 
organization of factors, the quartimax appears the more useful. Analytic 
methods should eventually help settle the long-standing controversy between 
protagonists of simple structure and of the general factor by making possible 
empirical studies to define more precisely the conditions under which in- 
variance, parsimony, etc., are best achieved. 


REFERENCES 


[1] Bolin, S. F. A factorial study of criteria of aircraft engine mechanics’ proficiency. 
Unpublished doctoral dissertation, Western Reserve Univ., 1955. 

{2] Burt, C. Group factor analysis. Brit. J. Psychol., Statist. Sect., 1950, 3, 40-75. 

[3] Carroll, J. B. An analytical solution for approximating simple structure in factor 
analysis. Psychometrika, 1953, 18, 23-38. 

[8a] Carroll, J. B. Further notes on analytic simple structure solutions. Mimeographed, 

1956. 

[4] Eysenck, H. J. Review of L. L. Thurstone’s “Primary mental abilities.”’ Brit. J. 
educ. Psychol., 1939, 9, 270-275. 

[5] Ferguson, G. A. The concept of parsimony in factor analysis. Psychometrika, 1954, 
19, 281-290. 

[6] Guilford, J. P. The structure of intellect. Psychol. Bull., 1956, 53, 267-293. 

[7] Guttman, L. A new approach to factor analysis: the radex. In Paul F. Lazarsfeld 
(Ed.), Mathematical thinking in the social sciences. New York: Columbia Univ. 
Press, 1954. 

[8] Holzinger, K. J. and Harman, H. H. Comparison of two factorial analyses. Psycho- 
metrika, 1938, 3, 45-60. 

[9] Holzinger, K. J. and Harman, H. H. Factor analysis. Chicago: Univ. Chicago 
Press, 1941. 

[10] Kaiser, H. F. An analytic rotational criterion for factor analysis. Amer. Psychologist, 
1955, 10, 438. (Abstract) 

[11] Kaiser, H. F. The varimax method of factor analysis. Unpublished doctoral disserta- 
tion, Univ. California, 1956. 

[12] Neuhaus, J. O. and Wrigley, C. F. The quartimax method: an analytic approach to 
orthogonal simple structure. Brit. J. statist. Psychol., 1954, 7, 81-91. 

[13] Pinzka, C. and Saunders, D. R. Analytic rotation to simple structure, II: extension 
to an oblique solution. Princeton, New Jersey: Educational Testing Service Research 
Bulletin RB-54-31, 1954. 

[14] Saunders, D. R. An analytic method of rotation to orthogonal simple structure. 
Amer. Psychologist, 1953, 8, 428. (Abstract) 

[15] Spearman, C. Thurstone’s work reworked. J. educ. Psychol., 1939, 30, 1-16. 

[16] Thurstone, L. L. Primary mental abilities. Psychometric Monogr. No. 1, 1938. Chicago: 

Univ. Chicago Press. 












































1Sisinth 





170 PSYCHOMETRIKA 


[17] Thurstone, L. L. A factorial study of perception. Psychometric Monogr. No. 4, 1944. 
Chicago: Univ. Chicago Press. S 

[18] Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 

[19] Wrigley, C. F. and Neuhaus, J. O. The matching of two sets of factors. Amer. Psy- 
chologist, 1955, 10, 418-419. 

[20] Zimmerman, W. S. A revised orthogonal rotational solution for Thurstone’s original 
primary mental abilities test battery. Psychometrika, 1953, 18, 77-93. 


Manuscript received 11/29/56 
Revised manuscript received 8/12/57 












































PSYCHOMETRIKA—VOL, 23, NO. 2 
JUNE, 1958 


A DISTINCTION BETWEEN EXACT AND APPROXIMATE 
NONPARAMETRIC METHODS* 


Wituram L. Sawreyt 


UNIVERSITY OF COLORADO MEDICAL CENTER 


Nonparametric tests are discussed in relation to parametric tests. A 
distinction is made between two types of nonparametric tests. One type 
leads to an exact significance level, the other to an approximate significance 
level. The failure to distinguish between these two types has led to confusion 
and error. Examples are cited. 


The use of nonparametric statistics in psychology has increased rapidly 
during the past few years. This trend has been reflected by the inclusion 
of sections on nonparametric methods in recent psychological texts, and by 
the appearance of a text and reviews solely on the topic of nonparametric 
methods [3, 6, 15, 18, 19, 21, 22, 26]. 

The increasing familiarity of psychologists with these techniques is 
to be welcomed, since these methods may be extremely useful in handling 
certain problems. However, in the light of a number of recent statements 
which have been made regarding so-called nonparametric methods, it also 
appears that misconceptions about the nature of some of these techniques 
can lead to inappropriate or at least to injudicious applications. The purpose 
of this article is to clarify some of these misconceptions by calling attention 
to the essential characteristics of a nonparametric method and by making 
a distinction between two types of nonparametric methods. In addition, 
certain precautions which should be observed in the use of nonparametric 
methods will be noted. 

The basic distinction between parametric and nonparametric methods 
lies in the fact that the latter are distribution free. In fact, they are some- 
times quite appropriately referred to as distribution-free methods. As the 
name implies, they generally do not depend on a specified distribution for the 
population from which samples are drawn. However, nonparametric methods 
are not entirely without assumptions. Some techniques assume a continuous 
distribution in the parent population. Other assumptions include independ- 
ence, same shape of distribution of the parent populations, sampling without 
replacement from a finite universe, and symmetrical distribution of the 


*Based on a paper presented at the annual meeting of the Rocky Mountain Psycho- 
logical Association, Salt Lake City, Utah, 1957 

{The author is indebted to Dr. John J. Conger for his many suggestions that greatly 
improved the exposition of this manuscript. 


171 





Pitsaly 








172 PSYCHOMETRIKA 


parent populations. One should know the assumptions underlying the tests 
in order that they may be most advantageously employed. Thus, the Wilcoxon 
(Mann-Whitney) Test has been derived in two ways [8, 16]. Either way 
would lead to having ‘‘no ties.’’ Of course in practice because of ‘crude 
measurements” we often have ties. If ties appear we have a violation of the 
assumptions to some extent, and we no longer have an exact test. Unless the 
ties become a sizeable portion of the total NV, then the exact tables are probably 
not too far in error. It should be pointed out that the Wilcoxon Test has been 
derived by White [27], Mann and Whitney [16], Festinger [7], and Haldane 
and Smith [10]. It is also equivalent to the Kruskal-Wallis [13] where only 
two groups are considered. 

Like any parametric test, applications of nonparametric tests require 
that the assumptions underlying them are met. Certainly the assumptions 
underlying nonparametric methods are generally easier to meet—herein lies 
their advantage. 

For a parametric test in which the assumptions are exactly met, an 
accurate probability estimate may be obtained. Since this is also true of 
a number of nonparametric methods, accuracy of estimation is not a dis- 
tinguishing characteristic between the two types of tests. However, there 
are a number of tests, frequently referred to as nonparametric, which do not 
yield exact probability estimates; furthermore the relative accuracy of these 
tests may depend upon the shape of the underlying distribution. It is with 
regard to these tests, which we might refer to as semi-nonparametric, that 
many current misconceptions arise. 

In order to gain a clearer understanding of the basis of some of these 
misconceptions, it seems advisable to draw a distinction between exact 
nonparametric methods and the semi-nonparametric methods. To facilitate 
an understanding of this distinction, Fisher’s exact test for a 2 X 2 table 
will be considered. This test can be derived assuming independence of obser- 
vations within and between cells. The exact probability of a configuration 
of cell frequencies can be calculated, given the marginal totals. Hence this 
may be referred to as an exact test. Another so-called nonparametric test 
advocated for use in this situation is the chi square test of independence. 
It is well known, and has been pointed out most adequately for psychologists 
by Lewis and Burke [14], that chi square when used in this type of situation 
is approximate. Their article also lists several errors made in using chi square. 

Here then are two nonparametric tests for the same hypothesis, one 
which is exact and one which is approximate. An exact test is one in which, 
if the alpha level of significance is obtained, we are accurate and correct 
in stating the odds that this difference could have happened by chance. On 
the other hand, with approximate-type tests such as chi square, if we obtain 
or exceed the alpha level of significance we cannot be confident of accuracy 
in quoting the odds that such a difference could have occurred by chance. 








WILLIAM L. SAWREY 173 


Unlike an exact nonparametric test, the true odds for a chi square may be 
more or less favorable than the level of significance indicates, depending 
upon the accuracy of the approximations involved. 

Although the distinction between a parametric test and an exact non- 
parametric test seems relatively clear, the distinction between these two 
types of tests and approximate nonparametric tests seems a little confused 
at times. In the case of semi-nonparametric or approximate tests, the dis- 
tributions of the parent populations from which samples are drawn may be 
very general, but these tests really depend on limiting processes for their 
accuracy. Therefore, they are not parametric tests. On the other hand, the 
level of confidence obtained is not exact, and they cannot be called exact 
nonparametric tests. 

Because of the wide use of chi square it will be taken up in more detail 
to illustrate the ways in which failure to distinguish between exact and 
approximate nonparametric statistics may lead to confusion and error. 
An examination of the derivation [5, 12] of chi square from the multinomial 
distribution indicates that three approximations are employed. These are 
Stirling’s approximation for factorials, an approximation similar to truncating 
an infinite series, and in essence the substitution of an integration for discrete 
summation. 

Because chi square can be derived without making any assumption 
about a parent population or any assumptions about parameters of the 
multinomial from which it is derived, it has been called distribution free or 
nonparametric. However, it is clear that the more accurate the three approxi- 
mations mentioned are, the more accurate the chi square test. The accuracy 
of these approximations depends on the speed of convergence to the exact 
value. As Birnbaum states, ‘The x’ statistic becomes approximately dis- 
tribution-free for N — © but is not distribution-free for finite N, and little 
is known about the manner in which its actual distribution for finite N and 
given F(x) is approximated by its limiting distribution” ((2], p. 435). Calling 
the chi square test nonparametric without being careful to distinguish between 
exact and semi-nonparametric tests may lead to confusion and error as can 
be seen by the following statements from Walker and Lev: “Examples of 
non-parametric methods which have already been studied are the x’ test, 
the percentiles. --- If the samples are not from a normal population and we 
use a non-parametric test with level of significance a, then for any parent 
population whatever the probability of an error of the first kind is actually 
equal to a or less than a because of discreteness” ([26], p. 426). It is clear 
that for chi square, which the above authors call a nonparametric test, the 
statement is incorrect. The same type of conflicting statement occurs in 
Mosteller and Bush [19]. 

Another illustration of the confusion that may result from failure to 
distinguish between parametric and semi-nonparametric tests is found in 











174 PSYCHOMETRIKA 


another recent text. After advocating the use of chi square as a nonparametric 
test the author states in a footnote, “Using a parametric technique Lepley 
reached the same decision. He used the critical ratio technique --- ’’ ((21], 
p. 130). For most practical applications the critical ratio depends upon the 
central limit theorem for its validity. Given normality and known population 
variances then the critical ratio leads to an exact test. If normality is not 
assumed and variances are unknown, an exact critical ratio test is not possible. 
An examination of Lepley’s data indicates that population variances were not 
known. Therefore it seems clear that both of the above techniques are 
actually nonparametric. In fact, when proportions are considered, the square 
of the critical ratio is actually equal to chi square with one degree of freedom 
([15], pp. 227-228). 

In order to appreciate fully that approximate nonparametric tests may 
be confused with parametric tests, especially if used as an approximation 
(i.e., when the assumptions are not fully met), let us look further at chi 
square and the central limit theorem. There are several slightly different 
theorems showing the converging processes that lead to normality of the 
variable being considered [5]. This makes it more apparent that the dis- 
tribution of the parent population from which samples are drawn, and to 
which the chi square test is applied, can be quite general. The crucial factor 
is the speed with which the convergence to normality takes place. The speed 
of convergence depends, as Birnbaum [2] has stated, upon the population 
distribution from which samples are taken. Cochran [4] has pointed out 
that it is important to consider how good an approximation exists. He states, 
“‘Without having looked into the matter, I had once or twice suggested to 
research workers that the F test might serve as an approximation even 
when the table consists of 1’s and 0’s. As a testimony to the modern teaching 
of statistics, this suggestion was received with incredulity, the objection 
being made that the F test requires normality, and that a mixture of 1’s and 
0’s could not by any stretch of the imagination be regarded as normally 
distributed. The same workers raise no objection to a x’ test, not having 
realized that both tests require to some extent an assumption of normality, 
and that it is not obvious whether F or x’ is more sensitive to the assump- 
tion” ([4], p. 262). 

The importance of the degree of approximation cannot be stressed too 
much. For example, in certain cases the results of a parametric test may 
involve some error due to a failure to fulfill completely the assumptions 
underlying the test. In such cases, the investigator, recognizing this possi- 
bility in the use of the parametric test, may be likely to turn to a nonpara- 
metric test as a more judicious alternative. He may choose, say, a nonpara- 
metric analysis of variance employing x’ rather than a parametric analysis 
of variance because the data are not known to be normal. However, if an 
approximate nonparametric technique is even more in error than the para- 




















WILLIAM L. SAWREY 175 


metric technique, little will have been gained by employing the former. The 
one great possible advantage of a nonparametric method occurs when the 
assumptions can be met and when it is also an exact test. Only in this case 
can the obtained level of significance be considered exact. This advantage 
of exactness can be lost both by using approximate nonparametric techniques 
and by using an exact nonparametric test when the underlying assumptions 
are not fully met. 

Most nonparametric methods that are exact have large sample ap- 
proximations given when the sample size exceeds that for which tables are 
available. These large sample approximations again depend on limiting 
processes for their accuracy; as a result these large sample tests become 
semi-nonparametric tests, a fact which is frequently not recognized. Further- 
more, Fix and Hodges [8] have shown they can be “subject to sizeable per- 
centage errors.”” Apparently one even has to be careful of ‘exact tables.” 
Fix and Hodges state that the Mann-Whitney Test tables by White [27] and 
Auble [1] “give significant probabilities in most cases with even less accuracy 
than the normal approximation” ((8], p. 301). Fortunately in the same article 
these authors provide appropriate formulas and tables for calculating the 
exact probabilities. While one may often not err seriously by using approxi- 
mate techniques, he should be aware that he is making an approximate test. 

In view of the above discussion, it seems apparent that when an exact 
nonparametric test is available for testing a particular hypothesis, it should 
always be used unless the labor is completely prohibitive, the significance of 
the result is apparent in any event, or a parametric test is applicable. In view 
of the fact that research on a problem often takes several or even hundreds of 
hours, it does not seem unreasonable to recommend that an investigator use 
an exact calculation that may take him up to an hour or even two hours. 
Computational methods for exact probabilities for contingency tables are 
given by Freeman and Halton [9]. These may become computationally 
laborious, but if the research is sufficiently important and chi square gives 
a doubtful approximation, then there is really no substitute for such methods. 

Further, if an investigator has to employ an approximate method, he 
should select the most appropriate and accurate one available. Sometimes 
the original source of a test gives various approximations of differing accuracy. 
Clearly, however, more research is needed to determine the accuracy of 
some approximate nonparametric tests as well as the accuracy of parametric 
tests under conditions where the assumptions are not completely met. 

It is not always obvious whether one is using an exact or an approximate 
method. One rough guide is that, in general, whenever an investigator is 
using @ nonparametric technique and when chi square, F, t, Beta, Gamma, or 
normal tables are used to determine the significance, then he is really using 
a semi-nonparametric test. Sometimes this procedure is employed even when 
an exact test is available. For example, Fisher’s test with the 2 X 2 can be 








176 ‘ ; PSYCHOMETRIKA 





used in place of the standard chi square test of independence, and also in 
place of the chi square used in a two group median test. Likewise, exact tests 
[9] are available for some of the nonparametric analysis of variance situations 
described by Mood [17], Wilson [31], and Roy and Mitra [20]. 












Further Considerations in the Use of Nonparametric Tests 


The power of a nonparametric test when compared to its parametric 
counterpart (when all assumptions are met for the parametric test) is always 
smaller. However, one must consider the fact that nonparametric tests are 
used when parametric tests are not appropriate because some assumptions 
are not met. In this case, it seems plausible, and in fact has been demonstated, 
that the nonparametric methods may in some circumstances actually have 
more power than the parametric methods [28]. In other words, a nonpara- 
metric test would be more likely to reject the null hypothesis. One can easily 
construct data to demonstrate this fact. 

Another related consideration which should be borne in mind is that 
nonparametric techniques most often test slightly different hypotheses than 
do parametric tests, as, for example, a median difference rather than a mean 
difference. If the distributions are skewed, these become definitely different 
hypotheses. Thus, one has to be especially careful to be aware of the exact 
statistical hypothesis under test when using a nonparametric technique and 
to relate his conlusions to this particular hypothesis. To illustrate, a non- 
parametric test may be sensitive to differences in shape, variability and 
location [24, 25]. Others may be sensitive mainly to location differences 
[13, 29] or mainly to variability differences [11, 23]. 

In conclusion, the following suggestions regarding the use of nonpara- 
metric tests can be specified. (a) Consider just how much error there may be 
in an approximate test. (b) Always use an exact test when at all feasible. 
(c) Carefully consider the assumptions involved in order to determine whether 
an exact nonparametric test is possible. In some instances, this can only be 
done by consulting the original source. (d) Select the tests carefully in relation 
to the hypothesis being tested and any alternative hypotheses that the in- 
vestigator may wish to guard against. 





































REFERENCES 


[1] Auble, D. Extended tables for the Mann-Whitney statistic. Bull. Inst. educ. Res., 
Indiana Univ., 1953, 1, 39. 

[2] Birnbaum, Z. W. Numerical tabulation of the distribution of Kolmogorov’s statistic 
for finite sample size. J. Amer. statist. Ass., 1952, 47, 425-441. 

[3] Blum, J. R. and Fattu, N. A. Nonparametric methods. Rev. educ. Res., 1954, 24, 
467-487. 

[4] Cochran, W. G. The comparison of percentages in matched samples. Biometrika, 
1950, 37, 256-266. 

[5] Cramer, H. Mathematical methods of statistics. Princeton: Princeton Univ. Press, 1946. 























WILLIAM L. SAWREY 177 


[6] Edwards, A. L. Statistical methods for the behavioral sciences. New York: Rinehart, 
1954. 

[7] Festinger, L. The significance of differences between means without reference to the 
frequency distribution function. Psychometrika, 1946, 11, 97-105. 

[8] Fix, Evelyn and Hodges, J. L., Jr. Significance probabilities of the Wilcoxon test. 
Ann. math. Statist., 1955, 26, 301-312. 

[9] Freeman, G. H. and Halton, J. H. Note on an exact treatment of contingency, good- 
ness of fit, and other problems of significance. Biometrika, 1951, 38, 141-149. 

[10] Haldane, J. B.S. and Smith, C. A. B. A simple exact test for birth-order effect. Ann. 
Eugen., 1947-49, 14, 117-124. 

{11] Kamat, A. R. A two-sample distribution-free test. Biometrika, 1956, 43, 377-387. 

[12] Kendall, M. G. The advanced theory of statistics. Vol. 1. London: Griffin, 1948. 

[13] Kruskal, W. H. and Wallis, W. A. Use of ranks in one-criterion variance analysis. 
J. Amer. statist. Ass., 1952, 47, 583-621. 

[14] Lewis, D. and Burke, C. J. The use and misuse of the chi-square test. Psychol. Bull., 
1949, 46, 433-489. 

[15] MceNemar, Q. Psychological statistics. New York: Wiley, 1955. 

[16] Mann, H. B. and Whitney, D. R. On a test of whether one of two random variables 
is stochastically larger than the other. Ann. math. Statist., 1947, 18, 50-60. 

[17] Mood, A. M. Introduction to the theory of statistics. New York: McGraw-Hill, 1950. 

[18] Moses, L. E. Non-parametric statistics for psychological research. Psychol. Bull., 
1952, 49, 122-143. 

[19] Mosteller, F. and Bush, R. R. Selected quantitive techniques. In G. Lindzey (Ed.), 
Handbook of social psychology. Vol. 1. Theory and method. Cambridge, Mass.: Addison- 
Wesley, 1954. Pp. 289-334. 

[20] Roy, S. N. and Mitra, 8. K. An introduction to some non-parametric generalizations 
of analysis of variance and multivariate analysis. Biometrika, 1956, 43, 361-376. 

[21] Siegel, S. Non-parametric statistics for the behavioral sciences. New York: McGraw- 
Hill, 1956. 

[22] Smith, K. Distribution-free statistical methods and the concept of power efficiency. 
In L. Festinger and D. Katz (Eds.), Research methods in the behavioral sciences. New 
York: Dryden, 1953. Pp. 536-577. 

[23] Sukhatme, B. V. On certain two-sample nonparametric tests for variances. Ann. 
math. Statist., 1957, 28, 188-194. 

[24] Swed, Frieda S. and Eisenhart, C. Tables for testing randomness of grouping in a 
sequence of alternatives. Ann. math. Statist., 1943, 14, 66-87. 

[25] Wald, A. and Wolfowitz, J. On a test whether two samples are from the same popu- 
lation. Ann. math. Statist., 1940, 11, 147-162. 

[26] Walker, Helen M. and Lev, J. Statistical inference. New York: Holt, 1953. 

[27] White, C. The use of ranks in a test of significance for comparing two treatments. 
Biometrics, 1952, 8, 33-41. 

[28] Whitney, D. R. A comparison of the power of nonparametric tests and tests based 
on the normal distribution under non-normal alternatives. Unpublished doctoral 
dissertation, Ohio State Univ., 1948. 

[29] Wilcoxon, F. Individual comparisons by ranking methods. Biometrics Bull., 1945, 1, 
80-83. 

[30] Wilcoxon, F. Probability tables for individual comparisons by ranking methods. 
Biometrics, 1947, 3, 119-22. 

[31] Wilson, K. V. A distribution-free test of analysis of variance hypotheses. Psychol. 
Bull., 1956, 53, 96-101. 


Manuscript received 7/18/57 
Revised manuscript received 11/25/57 

















BOOK REVIEWS 


Lee J. CRONBACH AND GOLDINE C. GuiEsER. Psychological Tests and Personnel Decisions. 
Urbana, Illinois: University of Illinois Press, 1957, pp. xii + 165. $3.50. 


Decision theory has created quite a stir in mathematical statistics, economics, and 
some areas of psychology. In decision theory statisticians have found a comprehensive 
logical framework for established statistical procedures. Economists have profitably used 
both decision theory and its brother, game theory, in analyzing complex competitive re- 
lationships. Psychologists who hoped that these theories could serve as a descriptive model 
of actual behavior have made less headway, since people seldom behave optimally as the 
theories require. The personnel psychologist, however, aspires to ‘‘optimum”’ behavior in 
his selection and placement procedures, so his problems seem made to order for decision 
theory. Cronbach and Gleser hope to “‘stir up the reader’s thoughts’’ about this possibility. 

A personnel decision process, as viewed in the monograph, starts with some infor- 
mation about an individual and a strategy for using this information to make either an 
investigatory or a terminal decision. If the decision is to investigate, e.g., get more test 
scores, the strategy is then applied to the augmented information. If the decision is to 
terminate, the individual is assigned to one of two or more treatments, e.g., accept or 
reject. The result of the treatment is an outcome, or payoff, that must be evaluated on a 
scale of utility. The problem is to choose the strategy that yields the largest expected 
utility. In the analysis it is assumed that the test information is available as a single score, 
which may be a composite, or as a set of orthogonal factor scores called “aptitudes.” It 
is also assumed that for any particular treatment, the expected payoff is a linear function 
of aptitude. Strategies are evaluated in terms of the increase in utility over that resulting 
from the best a priori strategy. With these ground rules, Cronbach and Gleser proceed to 
analyze several special decision processes. 

In considering the standard problem of selection with single-stage testing, “‘fixed- 
treatment” and “‘adaptive-treatment’”’ procedures are discriminated. In the former, the 
selected men are treated in a predetermined way, e.g., all are given a job with fixed speci- 
fications or are admitted to a course taught in a predetermined way. In adaptive treatment, 
alternative treatments, e.g., job specifications or teaching methods, are specified; both 
the men and the treatment are chosen optimally, so that the treatment is adapted to the 
particular men available. Utility increase is shown as a joint function of validity and 
selection ratio for each procedure. Under the assumptions stated, utility is a linear function 
of validity in the fixed-treatment case—a conclusion that the authors use as a stick for 
beating upon the coefficient of determination and the index of forecasting efficiency. 

In addition to the single-stage decision process, the problems of placement, classi- 
fication, and two-stage sequential selection are treated in some detail. Fixed and adaptive 
treatments are described for each; the effects of validity, selection ratio, and other variables 
are presented. Some problems of optimum test length are discussed; under the heading 
“The Bandwidth-Fidelity Dilemma,” the authors consider the design and evaluation of 
tests or test batteries that are to be used in making several different decisions. It is shown 
that the concept of utility allows a test to be evaluated compositely over all decisions to 
which it contributes, rather than piecemeal via several validity coefficients. Utility also 
provides a criterion for choosing between many short tests of different aptitudes (band- 
width) and one or two long, reliable tests (fidelity). Finally, the book includes some sug- 
gestions for evaluating outcomes, and hence measuring utilities. While this last problem 
is far from solved, the authors argue cogently that at least utility measurement is no more 
subjective and arbitrary than conventional evaluation procedures, and that utility has the 
advantage of being explicit. 


179 








7 res 







Reese We ee 


ary 


i= 
























= .e 


2 a ae 
stem ws Ge 


180 PSYCHOMETRIKA 


The book is well organized and has many pertinent figures. Ample references are 
made to the relevant literature. Most of the mathematical development has been placed 
in a series of appendices, while the main text states the problems and discusses the results. 
Nevertheless it is a book for specialists, assuming knowledge of psychometrics and personnel 
psychology, and requiring some mathematical sophistication. The book has been produced 
by photo-offset or its equivalent, but the unjustified right margins are not distracting; 
the equations and the text have been very carefully prepared. On the other hand, the small 
type and the soporific style combine to dull the stimulating effect of the new ideas. 

It is clear that the authors view their book as more than a stirring rod—indeed 
they hope that it is the harbinger of a new test theory. Conventional test theory has focussed 
attention on the test score. Most of the chapters in Gulliksen’s Theory of Mental Tests 
are concerned with the properties, meaning, and interpretation of test scores. Cronbach 
and Gleser have focused attention on outcomes. Their extensive analysis of personnel 
problems is from a view point that may be characterized as “Validity for What?” Their 
success in dealing with a wide variety of problems in a single framework is impressive. 
Decision theory lends coherence to a diverse testing literature focused on outcomes. If 
enough professional testers manage to read the book, there is a good chance that the authors’ 
hopes for a new test theory will be realized. 

Bert F. GREEN, JR. 
M.I.T. Lincoln Laboratory 


Jor K. Apams, Basic Statistical Concepts. New York: McGraw-Hill Book Company, 
1955, pp. xvii + 304. 


At a time when mathematical statisticians are writing ‘know-no-mathematics’’ texts 
for budding research workers and any others who will read them, here is a first text written 
by a social scientist in which there is no pretense of avoiding mathematics. Adams states 
two purposes for his book: (1) “to develop some basic mathematico-logical concepts of 
statistics, particularly the logic of statistical inference’ and (2) “to develop an under- 
standing of the language used in mathematical statistics, including elementary calculus.” 
The book is intended ‘‘primarily as a text for a one- or two-semester course for students 
who have had little or no previous calculus or statistics.’ Adams’ premise is “that the 
college student, whether oriented toward applications or toward mathematics, can best 
spend his time and energy in mastering some of the abstract concepts, i.e., mathematical 
models, and some of the mathematical language of the field.”’ 

These are some challenging ideas but first, just what has Adams done to implement 
this point of view? Without too much injustice, the main text of the book can be divided 
into three sections: the first on basic statistical concepts (actually the title of the book), a 
second exhibiting the notions of differentiation and integration within the context of con- 
tinuous distributions, and a final section dealing with normal, chi square, t, F, and bivariate 
distributions and their applicability in statistical inference. 

Most of the concepts peculiar to elementary statistics, with the exception of corre- 
lation and regression, are found in the first four chapters of the book. The approach to 
these ideas and the order in which they are developed are interesting and sometimes novel. 
For example, population is defined as “‘a value function, that is a class of ordered pairs 
such that the second member of each pair is a member of a set and the first member of 
the pair is the value of that member of the set.’’ When the topic of statistical inference 
is developed, it is done using only finite populations. After some instruction in how to 
count using combinatorial methods, but before any formal consideration of measures of 
central tendency and variability, one meets tests of significance, confidence intervals, 
Type I and Type II errors, and factors influencing the power of a test. 








































BOOK REVIEWS 181 


The vehicle for the introduction to statistical inference is a single, lengthy chips-in- 
bowl problem. It is instructive and clever, but placing so much weight on a single example 
has an unfortunate result. It contributes to the general vagueness of the distinctions 
between testing of hypotheses and determining of confidence intervals. Adams uses the 
terms confidence level and significance level interchangeably and nowhere in the book does 
he refer to confidence coefficient, critical region, or null hypothesis, all terms which could 
be used to increase the clarity of the distinctions. 

Even in the early pages of the book the approach and style are much more terse 
than one finds in other books directed at students of the same background. It takes only 
28 lines at the beginning of the first chapter to set forth the “problems dealt with by sta- 
tistics.’’ Definitions and theorems are stated formally and rigorously. Many theorems are 
proved in the 211 pages of text. More difficult theorems are proved in a 36-page mathe- 
matical appendix which deals with topics such as multiple integration, moments, Tcheby- 
sheff’s inequality, properties of chi square distributions, and bivariate normal distributions. 

With some bowing to the fact that a student may know only the calculus which 
he has been taught a few chapters earlier, most of the last section of the book reads like an 
elementary text in mathematical statistics. The treatment of normal, t, F, chi square, and 
bivariate distributions is really much closer to what one would find in Cramér’s new ele- 
mentary book, The Elements of Probability Theory, than it is to any of the most popular 
competitors for the market of undergraduate (or even graduate) statistics courses in 
psychology departments. 

A serious criticism of certain chapters in the last section of the book is that they con- 
tain too few worked examples. Though the chapter on ¢ begins with its distribution in 
terms of the gamma function and ends with a criterion for discarding exceptional observa- 
tions, nowhere in between does Adams work a single problem. Even the usually plentiful 
exercises are slighted in this chapter. There is only one exercise in which two independent 
groups are compared though there are two problems using related measures. In the chapter 
on the F distribution the only worked examples are a mixed-mode] problem and one show- 
ing the use of Tukey’s procedure for comparing individual means. 

After finishing this book one is inclined to ask whether he might use it as a text. 
Adams has done many things well. He defines terms with rigor. Many of the exercises are 
worth the price of the book to a teacher of statistics. Technical errors are few. A fresh 
treatment is given to many topics. 

Where, then, is the book potentially useful? A negative answer is that it is hard to 
see how it would fit into the undergraduate curriculum of most social science departments 
and, in particular, psychology departments. The day of the required statistics course for 
undergraduate majors in psychology is here, and the simple fact is that Adams’ book would 
be too difficult for many students. 

In part this critical comment is a consequence of the general theoretical approach 
to statistics, and in part it is a consequence of the particular way in which Adams ex- 
emplifies this approach. Adams ties rigor and brevity. These should be and are linked in 
statements of definition and theorems. The difficulty is that the brevity at these points 
too often extends into the nonmathematical expository material. Illustrations of this 
point are numerous. The treatment of the median takes five lines and is completely void 
of any discussion of the measure. Almost any additional material would have helped get 
across the notion of the power of a test of significance. In the chapter on bivariate distri- 
butions you have to go all the way to the exercises at the end of the chapter before you get 
any idea that regression and correlation relate to something more significant than chips 
in a bowl. 

But what of the general approach apart from its exemplification in Adams’ book? 
Is a good premise that first training in statistics is best spent on mastering mathematical 





= 


Heiser - 


+2 


asa 0 








182 PSYCHOMETRIKA 


models and mathematical language in the field? It is too bad that the answer to this ques- 
tion does not follow as neatly as some of the proofs in the book. Certainly no answer can 
be given which transcends time and individual differences, but the point of view taken 
here is that there are better ways of teaching elementary statistics in social science de- 
partments. Asking students to learn calculus (in a statistics course) so that they may better 
understand statistics is a little like asking them to learn statistics in a foreign language. 
The concepts of calculus must be pretty well in hand if they are to be of any real conceptual 
aid to the student of statistics. 

Another difficulty associated with this general approach is that it does not fit har- 
moniously with the growing philosophy that a course in statistics should be a part of a well- 
rounded education. So much emphasis on distribution theory and on calculus takes space 
and time from the uniquely statistical concepts and their wide applicability in society 
today. 
This book is probably best suited to a theory course as distinguished from the methods 
courses most often taught by psychologists. The usual calculus prerequisite for theory 
courses should make the book much more comprehensible to students. Some theory teachers 
might even feel that it is too easy. In a theory course students should be able to integrate 
and differentiate with some skill, not just to understand the concepts of integration and 
differentiation. 

Neither the book nor any review is likely to change many opinions on how the first 
course in statistics should be taught. Despite the reviewer’s disagreement with the author 
on this issue it is strongly recommended that those who prefer the mathematical approach 


consider this book. 
Rosert FE, Morin 


University of Texas 


Garrett, Henry E. Elementary Statistics. New York: Longmans, Green and Company, 
1956, pp. vii + 167. 


This little book (146 pages, exclusive of tables) is a shorter and more elementary 
version of an earlier introductory text by the same author. The book contains chapters 
on frequency distributions, averages, variability, percentiles, the normal curve, testing 
hypotheses, correlation, chi square, and comparing and combining test scores. 

The emphasis throughout the book is on computational mechanics, A large proportion 
of the pages is devoted to step-by-step methods for solving problems, worked examples, 
and tables and graphs requiring considerable elaboration in the text. There are relatively 
few interludes in which the student is told why all these calculations are important, what 
assumptions are involved, and the kinds and generality of the interpretations he can make. 

One of the most important, if not the most important function of an elementary 
statistics text, is to introduce the basic concepts of statistical inference. Throughout this 
book one finds misleading statements, omissions, and errors, but nowhere are they more 
numerous than in the chapter on testing hypotheses. 

A central difficulty in this chapter is the lack of clear and explicit distinctions be- 
tween statistics and parameters. This leads to confusion when null hypothesis is defined. 
On page 97 the text reads, “The null hypothesis in its most common form asserts flatly 
that the true mean difference between the groups being compared is zero; and that the 
obtained difference (if one has been found) is inconsequential and could well be zero.” 
This error of regarding a null hypothesis as a statement about a statistic is closely related 
to the failure to regard testing of a hypothesis as a decision problem with two possible 
kinds of errors. The treatment of Type I errors is inadequate. Type II errors are never 
mentioned. The student using this book would have a hard time answering a question on 
why the risk of a Type I error should not be made as small as possible. 








BOOK REVIEWS 183 


When sampling is considered in this chapter, the examples given to illustrate simple 
random sampling are really cases of more complex stratified and systematic sampling 
schemes. Three sentences from page 91 illustrate this problem. “Various devices, not all 
of which apply in every situation, have been employed to guarantee a random sample. 
In the problem stated above, for example, the experimenter would try to select children 
proportionally from all the elementary schools in the city, thus including all intellectual 
and socioeconomic levels. When the population is on file (telephone directory or civil 
service list) every twentieth or even five-hundredth name might be chosen.” The problem 
of sampling is further confused by statements like the following one found on page 90. 
“Tn order to infer from the performance of a sample (its M, for instance) what performance 
can be expected from the population, the sample must be representative of its population.” 
Were this true, of course, either all random samples would have to be called representative 
samples or we could not make inferences from any random samples. 

Several other criticisms of this chapter might be listed. Critical ratio and ¢ are used 
interchangeably. ‘Significant differences’ are equated with “‘real’’ differences and are 
presented as the opposite of differences of no consequence. Correlated percentages are 
compared as though they were independent. 

Even though some other chapters in this book are better than the chapter on testing 
hypotheses, it is not possible to recommend this book for use in elementary classes when 
s0 many more suitable books are available. 


University of Texas ROBERT E. MORIN 


W. ALLEN Watuis AND Harry V. Roserts. Statistics: A New Approach. Glencoe: Free 
Press, 1956, Pp. xxxviii + 646, $6.00. 


There are really several new approaches used in this book. These include the use of 
actual examples from many fields to show the universality of statistics, the teaching of 
statistics as a field in its own right, the introduction of nonparameteric techniques, and the 
elimination of Student’s ¢ and the F ratio from their accustomed place. The pleasing style 
and apparent freedom from numerical mistakes are also novel—or at least to be encouraged. 

This book is divided into four major categories: The Nature of Statistics, Statistical 
Description, Statistical Inference, and Special Topics. Each section includes about 150 
pages, so the book is extraordinarily long—too long to be covered in a one semester course 
in statistics. 

The authors talk of the avoidance of mathematics as a necessity, and, in fact, a virtue. 
Some simple algebra is essential, however, and after 200 pages of words, the numbers finally 
come up. Students who took and understood high school algebra will have no trouble with 
the mathematics in this book; those who did not understand it get no magic release from 
plus and minus sighs. 

There are some 215 examples in the book, ranging in title from Age and Sexual Activity 
to Seasonal Patterns of Lake Level, Lake Michigan-Huron. The reviewer found most of them 
very interesting, but his class agreed vehemently that the sheer number of examples slowed 
them down too much and made the book, particularly the first part, too long. An interesting 
innovation is the numbering of problems, tables, and examples from the page in the book 
on which they are found. 

In elementary psychological statistics, the significance of the difference between 
means is one of the most, if not the most, important of methods to be learned. To teach this 
in its logical place after the standard deviation, one must pick the standard error of the 
mean from Chapter 11, the null hypothesis from Chapter 12, and the significance of the 
difference between means from Chapter 13. 





beats me et RTD 





184 PSYCHOMETRIKA 


The null hypothesis is covered rather too completely for an elementary text, with 
operating characteristic curves probably out of place. Why analysis of variance is brought 
up at all in Chapter 13 is not clear—especially since the F test is played down. Correlatic:, 
on the other hand, which certainly is a subject which should be covered extensively in a 
course in psychological statistics, is one of the special topics. It is introduced as a ratio 
of the standard error of estimate to the standard deviation. No handy computational 
formulas for r are given. 

As may be inferred from the foregoing, this is no cookbook—nor is it a book where 
the occasional user of statistics can look for a formula or table. Perhaps the addition of 
an appendix with computational formulas and more tables would make this a more valuable 
book. 

This book seemed less than ideal for a one-term introduction to psychological statistics. 
It is an outstanding book for: (1) a full-year course in statistics, particularly an inter- 
departmental course; (2) learning statistics extracurricularly; (3) use as a source book 
for nonparametric techniques and for a wealth of examples. 
Cuar.es L. Woop 


University of Texas 


JEROME S. BRUNER, JACQUELINE J. GOODNOW, AND GeorGE A. Austin. A Study of Thinking. 
New York: John Wiley and Sons, Inc., 1956. pp. xi + 330. $5.50. 


There are perhaps two possible reasons for reviewing a book in Psychometrika—one 
being that the work is an example or discussion of quantification in psychology, and the 
other being that the work is provocative for the use of quantitative methods and models. 
A Study of Thinking is the latter kind of work. It is a highly verbal book offering much 
interesting material which might be incorporated in a rigorous approach to problem- 
solving behavior. 

At the outset the authors state: ‘“The learning and utilization of categories represents 
one of the most elementary and general forms of cognition by which man adjusts to his 
environment. It was in this belief that the research reperted in this volume was undertaken. 
For it is with the categorizing process and its many ramifications that this book is princi- 
pally concerned”’ (p. 2). ‘“Categorizing’’ is defined as behavior involving the placement or 
grouping of objects or events on the basis of selected cues. A distinction is made between 
category (or concept) formation, which is ‘‘the inventive act by which classes are con- 
structed,” and concept attainment, which is the search for and testing of attributes of 
objects and events in order to distinguish different categories. For example, the work of a 
physicist in distinguishing between substances that undergo fission is not to form the con- 
cepts “‘fissile’ and ‘“‘nonfissile’’ but to determine the attributes or properties that are 
associated with fissile and nonfissile substances. Concept attainment as analyzed by the 
authors consists of the following aspects: (1) There is an array of instances to be tested 
which can be characterized in terms of attributes, e.g., color, weight, etc. (2) As instances 
are encountered, a person makes decisions whether the sample before him is in one category 
or another. (3) Any given decision will be found to be correct, incorrect, or indeterminate. 
This is called ‘‘validation’’ of a decision. (4) Each decision provides potential information 
by limiting the number of attributes and attribute values that can be considered predictive 
of category membership. (5) The sequence of decisions made on the way to concept attain- 
ment is considered as a strategy which has certain objectives, i.e., to maximize the infor- 
mation obtained from each decision and test, to keep the ‘cognitive strain’’ (undue strain 
on memory or inference) within limits, and to regulate the risk of failing or other decision 
consequences. It is the description of behavior of this kind that is the task of this book. 

“To the reader conversant with contemporary American psychology,” the authors 








BOOK REVIEWS 185 


state at the end of Chapter 1, ‘“‘the book will appear singularly lacking in the more familiar 
forms of theoretical discourse. Neither the language of learning theory, of Gestalt theory, 
nor of psychoanalysis will be evident save in the form of incidental reference. For our 
objective has not been to extend reinforcement theory or the theory of traces or any other 
prepared psychological position to the problems of categorizing. We have not ignored the 
rich theoretical backgrounds of contemporary theory. Rather, we have come gradually 
to the conclusion that what is most needed in the analysis of categorizing phenomena - -- is 
an adequate analytic description of the actual behavior that goes on when a person learns 
how to use defining cues as a basis for grouping the events of his environment”’ (p. 23). 
This statement reveals two major characteristics of A Study of Thinking. First, there is a 
de-emphasis of the use of available language which has the disadvantage of contributing 
another set of definitions and explanations in contrast to a contribution toward consolidating 
our present explanations. Secondly, the lack of systematic position often makes the very 
interesting variables that are considered seem like a stimulating pot pourri rather than a 
systematically generated set of variables for study. 

Three category types are distinguished. (1) Conjunctive—defined by the joint presence 
of the appropriate values of several attributes, e.g., in a set of cards containing combinations 
of varying values of several attributes, the category defined by number, redness, and 
circles (all cards containing three red circles) is a conjunctive category. Most uses of the 
Vigotsky Test and Wisconsin Card Sorting Test employ this kind of categorizing. (2) 
Disjunctive—defined by the presence of appropriate values of several attributes or any 
constituent thereof, e.g., the possession of three red circles or three figures, red figures, 
circles, three red figures, red circles, or three circles. (3) Relational—defined by a specifiable 
relationship between attributes that define a category, e.g., the same number of figures as 
colors or fewer figures than colors. Following the statement of these definitions, Chapter 3, 
entitled “The Process of Concept Attainment,” is a rich and meaty chapter in which the 
authors are at their best in the sense that they present, on the basis of past work and their 
own insights, what appear to be important variables to be investigated in the study of 
concept attainment. The technique employed, essentially, is to look at examples of every- 
day behavior and to analyze these situations in keen detail. 

Chapters 4 through 7, approximately 60 per cent of the book (not counting the 
appendix), are devoted to presenting ‘‘. .. a series of several dozen experimental studies of 
concept attainment and concept utilization ... constructed to help us understand what is 
involved when a person learns to group discriminably different things in his environment 
into equivalence classes and to recognize new members of these classes without any further 
learning” (p. 80). The aim of the authors is to make overt certain features of problem- 
solving behavior and this is ingeniously done. For the most part, however, the experiments 
referred to are not completely rigorous ones; dependent variables are not always specifically 
identified, and tests of statistical significance are not presented when group differences pro- 
vide major findings for discussion. In general the experiments provide a means of observ- 
ing the performance of subjects on the basis of which the authors perform an insightful sort 
of task analysis of important variables under different task requirements. 

With repect to the leads which A Study of Thinking suggests for quantitatively- 
minded psychologists, the writer of this review submits the following points. 

1. The authors employ the concept of the ideal rational strategy using notions of 
maximizing utility and payoff matrices analogous to a game theory analysis. Their in- 
vestigation of categorizing behavior consists essentially of studying the conditions that 
affect the use of these ifleal strategies. Of interest here is the development of theoretical 
models which can generate these strategies. It may be that a fruitful approach would be 
the development of a quantitative theory which starts with some assumptions similar to 
the rational-man notions of game theory and then goes on to include behavioral variables 
such as are pointed up in this book, e.g., cultural biases, expectations about what con- 





tte 


¥ 


terete HH Bae 


186 PSYCHOMETRIKA 


stitutes successful solution, patterns of success (reinforcement schedule), and difficulty of 
inference. In regard to the use of models the authors state the general conclusion that 
‘‘.++ we are not prepared to develop or utilize as yet any formal or mathematical model 
to predict the effect of anticipated consequences on categorizing judgments. We have 
chosen to be satisfied with less precise prediction and to concern ourselves with the psycho- 
logical questions which must eventually underlie any model’’ (p. 77). In contrast to this is 
the current attitude of many psychologists that the use of formal models has heuristic 
value and can contribute to the identification and description of the “psychological ques- 
tions” with which they are concerned. 

2. In considering categorization with probabilistic cues, i.e., situations in which 
“certainty of inference from defining attributes to categorical identity cannot be achieved,” 
the authors mention the similarities of their experimental situations to the well-known 
Humphreys’ “guessing” situation. (They point out, however that the increased stimulus 
complexity of their experiments provide more opportunity to observe the kind and number 
of cues to which the subject is attending.) The event-matching behavior observed by 
Humphreys and in subsequent experiments by Grant, Estes, and others is explained by the 
authors primarily in terms of the hope of a unique solution, the need for a direct test of 
the hypothesis, and the tendency of subjects to regard it as more skillful to predict the least 
frequent alternatives. The “matching law,”’ as it has been called by Estes, has been a focus of 
statistical learning theory in describing such verbal learning situations. It seems then 
that this kind of quantitative theory should be scrutinized for its applications to the 
investigation of the behavior studied in this book. 

3. Much is made of the consequences of categorizing in terms of value or utility. 
The authors suggest that the scaling of such judgments might be a profitable enterprise in 
the analysis of categorizing behavior. 

4. The authors state that ‘‘--- it is our feeling that the unit of analysis now called 
the ‘response’ will have to be broadened considerably to encompass the long, contingent 
sequence of acts that, more properly speaking, can only be called a ‘performance’ ’’ (pp. 55f.). 
In this regard it would be interesting to consider analyses similar to those used to study the 
informational characteristics of sequential dependencies in language. In another context, 
some formal analyses have been made of the behavioral sequences in “trouble-shooting” 
problem solving with electronic equipment. This work seems relevant here. 

5. The book provokes thought on the relationships between personality variables 
and problem-solving approaches. This offers interesting measurement problems to the test 
constructor in connection with problem-solving aptitude. The writer is currently engaged 
in the investigation of relationships between various measures of “rigidity’’ and proficiency 
in solving routine and novel problems. 

Finally, it should be said that the book is stimulating reading. It contains extremely 
thoughtful analyses of behavior usually called problem solving. Many variables which 
appear to be highly relevant for establishing the necessary functional relationships for the 
scientific description of this behavior are richly described. On the other hand, the book is 
somewhat wordy and stretched out in parts. The appendix by Roger W. Brown discusses 
language as a system of categories and is only somewhat related to the mainstream of the 
book. One wonders whether the work reported should not have been published in monograph 
form rather than as a book. 

Rosert GLASER 
University of Pittsburgh 
American Institute for Research 














